A Faceted Crawler for the Twitter Service
Authors: 
George Valkanas
Authors: 
Antonia Saravanou
Authors: 
Dimitrios Gunopulos
Date published: 
2014
Published In: 
Web Information System Engineering (WISE 2014)
Type: 
Conference Article
Abstract: 

Researchers, nowadays, have at their disposal valuable data from social networking applications, of which Twitter and Facebook are the most prominent examples. To retrieve this content, the Twitter service provides 2 distinct Application Programming Interfaces (APIs): a probe-based and a streaming one, each of which imposes different limitations on the data collection process. In this paper, we present a general architecture to facilitate faceted crawling of the service, which simplifies retrieval. We give implementation details of our system, while providing a simple way to express the crawling process, i.e., the crawl flow. We experimentally evaluate it on a variety of faceted crawls, depicting its efficacy for the online medium. 

Related files: 

MaDgIK 2009-2018