Elastic complex event processing exploiting prediction
Supporting real-time, cost-effective execution of Complex Event processing applications in the cloud has been an important goal for many scientists in recent years. Distributed Stream Processing Systems (DSPS) have been widely adopted by major computing companies as a powerful approach for large-scale Complex Event processing (CEP). However, determining the appropriate degree of parallelism of the DSPS' components can be particularly challenging as the volume of data streams is becoming increasingly large, the rule set is becoming continuously complex, and the system must be able to handle such large data stream volumes in real-time, taking into consideration changes in the burstiness levels and data characteristics. In this paper we describe our solution to building elastic complex event processing systems on top of our distributed CEP system which combines two commonly used frameworks, Storm and Esper, in order to provide both ease of usage and scalability. Our approach makes the following contributions: (i) we provide a mechanism for predicting the load and latency of the Esper engines in upcoming time windows, and (ii) we propose a novel algorithm for automatically adjusting the number of engines to use in the upcoming windows, taking into account the cost and the performance gains of possible changes. Our detailed experimental evaluation with a real traffic monitoring application that analyzes bus traces from the city of Dublin indicates the benefits in the working of our approach. Our proposal outperforms the current state of the art technique in regards to the amount of tuples that it can process by four orders of magnitude.