TRIREME: Sailing through Flows of Big Data
In the era of Big Data, datasets are growing at a much higher rate than the processing power of a single machine. In the long run, distributed computing seems to be the only viable solution to be able to store and process the vast amount of data. Clouds have become a very attractive platform for large scale data processing due to their elastic property, i.e., additional resources can be leased for larger datasets or more complex processing. In a cloud environment, the monetary cost of using the resources is very important.
In this paper, we introduce Trireme, a system for large scale elastic data processing on the cloud. The system of- fers a declarative language based on SQL that is extended with user-defined functions (UDFs) and an inverted syntax to easily and declaratively express complex computation. Users can extend the functionality of the system by writ- ing new UDFs using a clear and simple interface. Trireme is designed to take advantage of the elasticity of clouds by o↵ering tradeo↵s between the running time and monetary cost of using the resources. We present the system design along with its main components, the language abstractions, and the optimization techniques that we use. Finally, we present the results of several large-scale experiments that show the e↵ectiveness of the system.