A Relational Approach to Complex Dataflows
Clouds have become an attractive platform for highly scal- able processing of Big Data, especially due to the concept of elasticity, which characterizes them. Several languages and systems for cloud-based data processing have been proposed in the past, with the most popular among them being based on MapReduce. In this paper, we present Exareme, a system for elastic large-scale data processing on the cloud that follows a more general paradigm. Exareme is an open source project 1 . The system offers a declarative language which is based on SQL with user-defined functions (UDFs) extended with parallelism primitives and an inverted syn- tax to easily express data pipelines. Exareme is designed to take advantage of clouds by dynamically allocating and deallocating compute resources, offering trade-offs between execution time and monetary cost.