Difference between revisions of "Mapping entity names in a document to places on a map"

From Cohen Courses
Jump to navigationJump to search
 
Line 1: Line 1:
This is a sample project posted by [[User:Wcohen|William]] - although anyone that wants to work on it for real is welcome to!
+
''This is a sample project posted by [[User:Wcohen|William]] - although anyone that wants to work on it for real is welcome to!''
  
 
Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names.  You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.)
 
Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names.  You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.)
Line 6: Line 6:
  
 
I have data appropriate for evaluating solutions this problem.  GeoNames.org has a database of 6M+ place names with associated lat/long coordinates.  Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames.  I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates.  Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques.
 
I have data appropriate for evaluating solutions this problem.  GeoNames.org has a database of 6M+ place names with associated lat/long coordinates.  Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames.  I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates.  Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques.
 +
 +
Proposed by: [[User:Wcohen|William Cohen]]

Latest revision as of 10:32, 8 September 2011

This is a sample project posted by William - although anyone that wants to work on it for real is welcome to!

Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names. You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.)

The goal of this project is to use structured prediction to predict the set of place "senses" that corresponds to the set of place names in a document. This task is thus similar to all-words WSD.

I have data appropriate for evaluating solutions this problem. GeoNames.org has a database of 6M+ place names with associated lat/long coordinates. Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames. I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates. Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques.

Proposed by: William Cohen