Mapping entity names in a document to places on a map

From Cohen Courses
Jump to navigationJump to search

This is a sample project posted by William - although anyone that wants to work on it for real is welcome to!

Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names. You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.)

The goal of this project is to use structured prediction to predict the set of place "senses" that corresponds to the set of place names in a document. This task is thus similar to all-words WSD.

I have data appropriate for evaluating solutions this problem. GeoNames.org has a database of 6M+ place names with associated lat/long coordinates. Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames. I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates. Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques.

Proposed by: William Cohen