A Latent Variable Model for Geographic Lexical Variation

From Cohen Courses
Revision as of 23:42, 26 September 2012 by Rajarshd (talk | contribs)
Jump to navigationJump to search

Citation

A Latent Variable Model for Geographic Lexical Variation. Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), Cambridge, MA, October 2010.

Online version

Pdf of the paper

Summary

This paper aims to analyze the variation in the usage of words in vernacular wrt geography. In particular, it analyzes lexical variation by both topic and geography. It also separates regions into coherent linguistic communities. Also it can predict with some accuracy the location of the author from raw text.

Data

This work is based on the Twitter dataset which can be found here. Only GeoTagged data is used. Also they choose users based on certain criterias such as, they should be active on twitter (wrote atleast 20 messages over the period) and should follow less than 1000 people and have less than 1000 followers (so they are not celebrities or influential people)