A Faceted Crawler for the Twitter Service

Jan 2014
Conference

Researchers, nowadays, have at their disposal valuable data from social networking applications, of which Twitter and Facebook are the most prominent examples. To retrieve this content, the Twitter service provides 2 distinct Application Programming Interfaces (APIs): a probe-based and a streaming one, each of which imposes different limitations on the data collection process. In this paper, we present a general architecture to facilitate faceted crawling of the service, which simplifies retrieval. We give implementation details of our system, while providing a simple way to express the crawling process, i.e., the crawl flow. We experimentally evaluate it on a variety of faceted crawls, depicting its efficacy for the online medium.

Citation

George Valkanas, Antonia Saravanou, Dimitrios Gunopulos, "A Faceted Crawler for the Twitter Service ", Web Information System Engineering (WISE 2014), 2014

File

facetcrawl.pdf