Geographical aggregation of microblog posts for LDA topic modeling

In this paper we propose an aggregation strategy for geolocated Twitter posts based on a hierarchical definition of the regular activity patterns within a specific region. The aggregation yields a series of documents that are used to train a topic model. The resulting model is tested against the ones produced by two other aggregation strategies proposed in the literature: aggregation by user and by hashtag. For comparison, we use quality metrics widely used on the literature. The results show that the Geographical Aggregation performs similarly to hashtag aggregation in terms of Jensen-Shannon Divergence and outperforms other aggregation schemes in its ability to reproduce the original cluster labels. One potential application behind this is the discovery of unusual events or as a basis for geolocating messages from text.

Descarga el archivo aquí