Chambers and Jurafsky, ACL 2010

From Cohen Courses
Revision as of 01:05, 3 November 2011 by Amr1 (talk | contribs)
Jump to navigationJump to search

Template-Based Information Extraction without the Templates is a paper by Chambers and Jurafsky which be found online.

Citation

Chambers and Jurafsky. Template-Based Information Extraction without the Templates. In ACL 2010.

Summary

This paper proposed a solution finding and filling out templates in an unsupervised manner. Templates are extremely important from information extraction where they are a step above semantic role labelling. Templates, in a general sense, indicate what is happening and who is involved. While in most cases templates are either given out (to be filled) or engineered for a specific task, this paper tried to do both unsupervised.

Method

While the actual method the paper followed can appear to be a mishmash of steps, the underlying ideas are quite clear. The first main idea is that events that occupy one template are bound to happen near each other. This means that if the verbs "kidnap" and "taken" are found close together in the corpus, that is a piece of evidence that "kidnap" and "taken" have something to say in the same template. If we find a lot of evidence for certain verbs to be together, then we group them in a template.

The other main idea is that while they need to get the verb relations from the given corpus, there are not enough examples in the corpus to find patterns about the verbs. For this, they expanded the corpus at hand by looking at the New York Times and the Reuters section of the Gigaword corpus. They only considered text that had to do with a template which they were considering.

Related Work

Other work has been done in finding templates in an unsupervised manner. Good work has been done in Patwardhan and Riloff, EMNLP 2007 and Patwardhan and Riloff, EMNLP 2009.