Elastic Processing of Analytical Query Workloads on IaaS Clouds
Many modern applications require the evaluation of analytical queries on large amounts of data. Such queries entail joins and heavy aggregations that often include user-defined functions (UDFs). The most efficient way to process these specific type of queries is using tree execution plans. In this work, we develop an engine for analytical query processing and a suite of specialized techniques that collectively take advantage of the tree form of such plans. The engine executes these tree plans in an elastic IaaS cloud infrastructure and dynamically adapts by allocating and releasing pertinent resources based on the query workload monitored over a sliding time window. The engine offers its services for a fee according to service-level agreements (SLAs) associated with the incoming queries; its management of cloud resources aims at maximizing the profit after removing the costs of using these resources. We have fully implemented our algorithms in the Exareme dataflow processing system. We present an extensive evaluation that demonstrates that our approach is very efficient (exhibiting fast response times), elastic (successfully adjusting the cloud resources it uses as the engine continually adapts to query workload changes), and profitable (approximating very well the maximum difference between SLA-based income and cloud-based expenses).