Time is Money but How Much? Optimal Tradeoffs in Dataflow Processing on the Cloud
Processing complex dataflow tasks (graphs) on the cloud is a critical requirement for many applications. Schedul- ing dataflow graphs on cloud resources has many tradeoffs between monetary cost and processing time as influenced by the placement of the operators of the graphs and the use of different resources. This gives rise to eco-elasticity, an additional kind of elasticity that comes from economics and is orthogonal to the traditional definition of elasticity in clouds (i.e., dynamically change the size of the virtual cluster based on the demand). Discovering these tradeoffs and offering the ability to select the “best” time-money combination in each case is essential in clouds. In this work, we focus on the eco-elasticity of dataflow graphs with respect to finding tradeoffs between execution time and monetary cost. The relevant critical questions are the following: a) Does such tradeoff exist? b) Can it be discovered at an overhead that makes it worth it? We demonstrate that eco- elasticity exists in several common tasks when using two different dataflow abstractions, one corresponding to MapReduce and one more general. Between the two, we show that with the general dataflow graph abstraction, much more eco-elasticity can be extracted than what is possible under the MapReduce abstraction, while remaining computationally tractable. Finally, we also demonstrate that eco-elasticity can be discovered in practice using highly scalable and efficient algorithms.