Near Neighbor Join

Herald Kllapi

Date: 17/01/2014
University: University of Athens
Room : A56
Time: 15:30 (coffee 15:00)

An increasing number of Web applications such as friends recommendation 
depend on the ability to join objects at scale. The traditional approach 
taken is nearest neighbor join (also called similarity join), whose goal 
is to find, based on a given join function, the closest set of objects 
or all the objects within a distance threshold to each object in the 
input. The scalability of techniques utilizing this approach often 
depends on the characteristics of the objects and the join function. 
However, many real-world join functions are intricately engineered and 
constantly evolving, which makes the design of white-box methods that 
rely on understanding the join function impractical. Finding a technique 
that can join extremely large number of objects with complex join 
functions has always been a tough challenge.

In this paper, we propose a practical alternative approach called near 
neighbor join that, although does not find the closest neighbors, finds 
close neighbors, and can do so at extremely large scale when the join 
functions are complex. In particular, we design and implement a 
super-scalable system we name SAJ that is capable of best-effort joining 
of billions of objects for complex functions. Extensive experimental 
analysis over real-world large datasets shows that SAJ is scalable and 
generates good results.


MaDgIK 2009-2018