Difference between revisions of "Mapping entity names in a document to places on a map"
(Created page with 'This is a sample project posted by William - although anyone that wants to work on it for real is welcome to! Place names are often ambiguous - e.g., London can …') |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This is a sample project posted by [[User:Wcohen|William]] - although anyone that wants to work on it for real is welcome to! | + | ''This is a sample project posted by [[User:Wcohen|William]] - although anyone that wants to work on it for real is welcome to!'' |
Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names. You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.) | Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names. You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.) | ||
− | The goal of this project is to use structured prediction to predict the set of place "senses" that corresponds to the set of place names in a document. This task is thus similar to all-words WSD. | + | The goal of this project is to use structured prediction to predict the set of place "senses" that corresponds to the set of place names in a document. This task is thus similar to [http://www.cse.unt.edu/~rada/senseval/senseval3/tasks.html#EnglishAW all-words WSD]. |
I have data appropriate for evaluating solutions this problem. GeoNames.org has a database of 6M+ place names with associated lat/long coordinates. Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames. I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates. Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques. | I have data appropriate for evaluating solutions this problem. GeoNames.org has a database of 6M+ place names with associated lat/long coordinates. Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames. I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates. Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques. | ||
+ | |||
+ | Proposed by: [[User:Wcohen|William Cohen]] |
Latest revision as of 10:32, 8 September 2011
This is a sample project posted by William - although anyone that wants to work on it for real is welcome to!
Place names are often ambiguous - e.g., London can mean London, Ontario or London, England - and frequently, a document will contain many place names. You can view the task associating the correct place with a place-name as a sort of word sense disambiguation (WSD) problem (where an atlas fills the role of a database of word sense.)
The goal of this project is to use structured prediction to predict the set of place "senses" that corresponds to the set of place names in a document. This task is thus similar to all-words WSD.
I have data appropriate for evaluating solutions this problem. GeoNames.org has a database of 6M+ place names with associated lat/long coordinates. Some pages in Wikipedia are tagged with lat/long coordinates, which disambiguates them relative to GeoNames. I've also collected a set of 600k Wikipedia pages with multiple links to pages with lat/long coordinates. Taken together these could be used to perform supervised learning, or to evaluate unsupervised learning techniques.
Proposed by: William Cohen