Embedding-based subsequence matching with gaps–range–tolerances: a Query-By-Humming application
Alexis Kotsifakos
Isak Karlsson
Panagiotis Papapetrou
Vassilis Athitsos
Dimitrios Gunopulos
Date published: 
Published In: 
VLDB J. 24(4): 519-536
Journal Article

We present a subsequence matching framework that allows for gaps in both query and target sequences, employs variable matching tolerance efficiently tuned for each query and target sequence, and constrains the maximum matching range. Using this framework, a dynamic programming method is proposed, called SMBGT, that, given a short query sequence Q and a large database, identifies in quadratic time the subsequence of the database that best matches Q. SMBGT is highly applicable to music retrieval. However, in Query-By-Humming applications, runtime is critical. Hence, we propose a novel embedding-based approach, called ISMBGT, for speeding up search under SMBGT. Using a set of reference sequences, ISMBGT maps both Q and each position of each database sequence into vectors. The database vectors closest to the query vector are identified, and SMBGT is then applied between Q and the subsequences that correspond to those database vectors. The key novelties of ISMBGT are that it does not require training, it is B Panagiotis Papapetrou panagiotis@dsv.su.se Alexios Kotsifakos alexios.kotsifakos@mavs.uta.edu Isak Karlsson isak-kar@dsv.su.se Vassilis Athitsos athitsos@uta.edu Dimitrios Gunopulos dg@di.uoa.gr 1 Department of Computer Science and Enginering, University of Texas at Arlington, Arlington, TX, USA 2 Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden 3 Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece query sensitive, and it exploits the flexibility of SMBGT. We present an extensive experimental evaluation using synthetic and hummed queries on a large music database. Our findings show that ISMBGT can achieve speedups of up to an order of magnitude against brute-force search and over an order of magnitude against cDTW, while maintaining a retrieval accuracy very close to that of brute-force search.

Related files: 
application/pdf icongunopulos.pdf 870.21 KB

MaDgIK 2009-2016