Schedule Optimization for Data Processing Flows on the Cloud
Speaker:

Herald Kllapi

Date: 27/05/2011
University: Univ. of Athens
Room : A56
Time: 4:00pm
Slides:
Abstract: Scheduling data processing workflows (dataflows) on the cloud is a very complex and challenging task. It is essentially an optimiza- tion problem, very similar to query optimization, that is character- istically different from traditional problems in two aspects. First, its space of alternative schedules is very rich, due to various opti- mization opportunities that cloud computing offers. Second, its op- timization criterion is at least two-dimensional, with the monetary cost of using the cloud being at least as important as the query com- pletion time. In this paper, we study the scheduling of dataflows that involve arbitrary data processing operators in the context of three different problems: 1) minimize completion time given a fixed budget, 2) minimize monetary cost given a time limit, and 3) find trade-offs between completion time and monetary cost without any a-priori constraints. We formulate the problems and present an approximate optimization framework to address them that makes use of resource elasticity in the cloud. To investigate the effective- ness of our approach, we incorporate the devised framework into a prototype system for dataflow evaluation and instantiate it with several greedy, probabilistic, and exhaustive search algorithms. Fi- nally, we present the results of several experiments that we have conducted with the prototype elastic optimizer on numerous sci- entific and synthetic dataflows and we identify the advantages and disadvantages of the various search algorithms. The overall results are quite promising and indicate the effectiveness of our approach.

MaDgIK 2009-2018