Milking the Streams
In the last few years, we have been witnessing an ever-growing need for continuous observation and monitoring applications. This need is driven by recent technological advances that have made streaming applications possible, and by the fact that analysts in various domains have realized the value that such applications can provide. In this presentation, we discuss several studies that we have done in this area. First, we describe a general framework for enabling complex applications over data streams. This framework is based on efficiently computing an approximation of multi-dimensional distributions of streaming data. We demonstrate the use of the proposed framework for the diverse problems of deviation detection, and detection and tracking of homogeneous regions. Then, we describe efficient techniques for the identification of heavy hitters (i.e., the elements in the stream that appear the most often). Our techniques can effectively detect heavy hitters in ad hoc windows of interest in the stream, and also identify conditional heavy hitters (i.e., the parent-child pairs of stream elements such that the children are the most frequent given their parent). Finally, we briefly discuss future research directions in this general area.