Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called dynamic context disambiguation. Once place names are recognized in an input text, they are solved using a grammar, in which a set of rules specifies how ambiguities could be solved, in a similar way to that which a person would utilize, considering the context. As a result, we have an assignment of the most likely geographic properties of the recognized places. We propose an assessment measure based on a ranking of closeness relative to the predicted and actual locations of a place name. Regarding this measure, our method outperforms OpenStreetMap Nominatim. We include other assessment measures to assess the recognition ability of place names and the prediction of what we called geographic levels (administrative jurisdiction of places).

Descarga el archivo aquí