Scalable Answering of Questions Expressed in Natural Language over Large Geographic Knowledge Bases
The overarching aim of GeoQA is to create a question answering engine that will be able to answer complex questions over the geospatial domain. The motivation of this project stems from the fact that current popular search engines, like Google, are struggling to give immediate answers to complex geographic questions like “Which Greek villages have rivers running through?”. Hence, the main objective of the project GeoQA is twofold: (i) to show how to extend existing knowledge graphs with geographic knowledge found in important geospatial datasets available on the Web today, and (ii) to develop techniques and systems for answering complex non-factoid questions over such geo-knowledge graphs effectively (with high precision and recall) and efficiently (with very short response times).
In this project, we generate the knowledge graph YAGO2geo by extending the well-known knowledge graph YAGO2 with geographic knowledge as found in the datasets Global Administrative Areas and OpenStreetMap and in national geospatial datasets from selected countries and institutions. As some of these data sources (e.g. OpenStreetMap) are subject to frequent modifications, we develop scalable techniques and software for keeping YAGO2geo up-to-date. Additionally, we generate a gold standard corpus of geographic questions in natural language and their answers which will contain more than 1000 questions. Finally, we develop a prototype question answering engine (GeoQA2) based on natural language processing and knowledge graph embedding techniques, which will allow for intuitive visualization of the answers.