Representation and Querying of Linked Geospatial Data
Linked data is a new research area which studies how one can make data available on the Web, and interconnect it with other data with the aim of increasing its value for everybody. The resulting "Web of data" has recently started been populated with geospatial data. Great Britain's national mapping agency, Ordnance Survey, has been the first national mapping agency that has made various kinds of geospatial data from Great Britain available as open linked data. With the recent emphasis on open government data in many countries, some of it already encoded as linked data, the development of useful Web applications utilizing geospatial data is just a few SPARQL queries away.
The recent availability of geospatial information as linked open data has generated new interest in the representation and querying of geospatial data expressed in RDF. However, RDF can only represent thematic data and needs to be extended if we want to model geospatial information. In this thesis, we propose, formalize, implement and evaluate an appropriate extension of RDF and SPARQL 1.1 for geospatial information.
The first contribution of this thesis is the development of the data model stRDF and the query language stSPARQL. stRDF is a data model that extends RDF with the ability to represent geospatial data that change over time. stSPARQL extends SPARQL 1.1 for querying stRDF data. The thesis develops two versions of stRDF and stSPARQL. In the first version of stRDF and stSPARQL we follow the main ideas of constraint databases and represent spatial and temporal objects as quantifier-free formulas in a first-order logic of linear constraints. In the second version we opt for a practical solution that uses standards defined by the Open Geospatial Consortium to represent and query geospatial information since such standards are widely used in applications.
Independently of our work and at around the same time, the Open Geospatial Consortium proposed GeoSPARQL which is a standard for a SPARQL-based query language for geospatial data expressed in RDF. GeoSPARQL and stSPARQL are very close syntactically and semantically. Both languages share the same basic assumption of using existing OGC standards (Well-Known Text and Geography Markup Language) for the representation and querying of geometries as typed literals in RDF. In addition, GeoSPARQL defines some basic classes and properties that encode well-known GIS and OGC concepts (e.g., feature). Also, GeoSPARQL goes beyond stSPARQL by providing a vocabulary and rewrite rules that allows the assertion of topological relations. This opens up the possibility of useful forms of topological reasoning not covered by stSPARQL. On the other hand, GeoSPARQL does not offer offers spatial aggregate functions like stSPARQL nor it has any notion of time.
The second contribution of the thesis is the development of Strabon, a new RDF store that is a fully-implemented, open-source, storage and query evaluation system for stRDF/stSPARQL and the corresponding subset of GeoSPARQL. The thesis presents the architecture of Strabon, studies its performance experimentally and shows that it scales to very large data volumes and performs, most of the times, better than all other geospatial RDF stores it has been compared with.
The third contribution of the thesis is the development of the benchmark Geographica which is the first benchmark for evaluating geospatial RDF stores that takes into account recent advances to the state of the art in this area. Geographica uses both real-world and synthetic data to test systematically the offered functionality and the performance of some prominent geospatial RDF stores.
Strabon has been used in real-world applications like the real-time wildfire monitoring service developed in collaboration with the National Observatory of Athens that has been operational and used by decision makers and emergency services in Greece during the summer of years 2012 and 2013. The service first extracts geometries capturing the extent of fires from satellite images received in real-time. These geometries are then encoded in stRDF allowing their combination with relevant linked geospatial data. A series of stSPARQL query and update statements are then evaluated in Strabon in order to produce maps capturing the monitored fires together with contextual information useful to decision makers dealing with the emergency situation.