Schedule Optimization for Data Processing Flows on the Cloud
Speaker: Herald Kllapi
University:Univ. of Athens
Abstract:Scheduling data processing workflows (dataflows) on the cloud is a very complex and challenging task. It is essentially an optimiza- tion problem, very similar to query optimization, that is character- istically different from traditional problems in two aspects. First, its space of alternative schedules is very rich, due to various opti- mization opportunities that cloud computing offers. Second, its op- timization criterion is at least two-dimensional, with the monetary cost of using the cloud being at least as important as the query com- pletion time. In this paper, we study the scheduling of dataflows that involve arbitrary data processing operators in the context of three different problems: 1) minimize completion time given a fixed budget, 2) minimize monetary cost given a time limit, and 3) find trade-offs between completion time and monetary cost without any a-priori constraints. We formulate the problems and present an approximate optimization framework to address them that makes use of resource elasticity in the cloud. To investigate the effective- ness of our approach, we incorporate the devised framework into a prototype system for dataflow evaluation and instantiate it with several greedy, probabilistic, and exhaustive search algorithms. Fi- nally, we present the results of several experiments that we have conducted with the prototype elastic optimizer on numerous sci- entific and synthetic dataflows and we identify the advantages and disadvantages of the various search algorithms. The overall results are quite promising and indicate the effectiveness of our approach.