Skip to main content
back to news

Dissertation support PhD candidate Herald Kllapi

The public dissertation support of PhD candidate Herald Kllapi will take place in classroom Α56, tomorrow Wednesday, 25 February, at 8:45am.


Dissertation Title: Elastic dataflow processing on the cloud

Summary: Clouds have become an attractive platform for the large-scale processing of modern applications on Big Data, especially due to the concept of elasticity, which characterizes them: resources can be leased on demand and used for as much time as needed, offering the ability to create virtual infrastructures that change dynamically over time. Such applications often require processing of complex queries that are expressed in a high-level language and are typically transformed into data processing flows, or simply dataflows. A logical question that arises is whether elasticity affects dataflow execution and in which way. It seems reasonable that the execution is faster when more resources are used, however the monetary cost is higher. This gives rise to the concept eco-elasticity, an additional kind of elasticity that comes from economics, and captures the trade-offs between the response time of the system and the amount of money we pay for it as influenced by the use of different amounts of resources.

In this thesis, we approach the elasticity of clouds in a unified way that combines both the traditional notion and eco-elasticity. This unified elasticity concept is essential for the development of auto-tuned systems in cloud environments. First, we demonstrate that eco-elasticity exists in several common tasks that appear in practice and that can be discovered using a simple, yet highly scalable and efficient algorithm. Next, we present two cases of auto-tuned algorithms that use the unified model of elasticity in order to adapt to the query

workload: 1) processing analytical queries in the form of tree execution plans in order to maximize profit and 2) automated index management taking into account compute and storage resources. Finally, we describe EXAREME, a system for elastic data processing on the cloud that has been used and extended in this work. The system offers declarative languages that are based on SQL with user-defined functions (UDFs) extended with parallelism primitives. EXAREME exploits both elasticities of clouds by dynamically allocating and de-allocating compute resources in order to adapt to the query workload.