Gerber and Chai, ACL 2010

From Cohen Courses
Revision as of 00:23, 30 November 2011 by Mg1 (talk | contribs) (Created page with '== Citation == Matthew Gerber and Joyce Y. Chai. Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates. ACL 2010. == Online Version == http://aclweb.org/antholo…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Matthew Gerber and Joyce Y. Chai. Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates. ACL 2010.

Online Version

http://aclweb.org/anthology/P/P10/P10-1160.pdf

Summary

This Paper addresses the problem of Semantic Role Labeling. That is, verbs take arguments, like "The lion ate the monkey" - ate has "the lion" as its "agent" argument and "the monkey" as its "patient" argument. (Verbal) Semantic role labeling is the process of determining what parts of a sentence correspond to the arguments of verbs. You can do this for nouns that are derived from verbs, as well, such as "shipping costs" - "shipping" has an entity that ships and an entity that was shipped, even though it's not even the head noun of the noun phrase its in. If you really want to understand text, you need to be able to determine these arguments for any kind of predicate that you encounter.

There are resources of labeled data for these tasks (PropBank for verbal predicates and NomBank for nominal predicates). However, as this paper points out, they are sometimes limited in the kind of information they give you. In the example containing "shipping costs," NomBank just labels "shipping" as a predicate, without giving its arguments. Part of the reason for that is that the arguments are distant in the text, occurring in previous sentences sometimes very far away. That's what these authors call "implicit arguments" - essentially, "arguments not labeled in NomBank, most often because they are not in the same sentence."

So, the authors labeled all of the arguments for a small set of very frequent nominal predicates in the Penn Treebank (the same dataset that NomBank is built on), then learned classifiers for those arguments. Though semantic role labels are a structured output for each predicate, they perform independent classifications for each argument of the predicate, ignoring dependencies between them. Their results were decent, and considering that this is a very hard problem that no one has really tried to solve before, decent results are actually really good.

A few details

In addition to attempting to solve a task that has not received much attention, this paper was interesting because they had a good analysis of their features and a long and interesting discussion section.

Their features relied on coreference resolution, so that when deciding what should fill each implicit argument slot for a predicate they had a coreference chain to get features from, instead of individual mentions alone. Their features included looking at VerbNet and WordNet, PMI and some other statistics from the document collection, and a few shallow syntax features. One interesting feature that was a mix of semantics and syntax was looking at the head of the predicate's right sibling. For instance, "price" was one of the predicates they annotated. This feature distinguishes between "price index," which rarely takes an arg0 (seller), and "price drop," which does often have a seller expressed. Essentially this feature allows the model to learn sense differences between these predicates that are expressed in the heads of noun complements.

Another interesting analysis they performed was investigating how many sentences previous to the predicate the implicit arguments were found. About 55% of the arguments were in the same sentence as the predicate, and it rises to 90% in the three sentences prior to the predicate, but it doesn't reach 100% until 46 sentences prior. To simplify their model and feature calculations, they used only the current and previous two sentences when training and testing their model. They provide a comparison with an oracle selecting from the allowed sentences to show the recall levels that are possible with their simplifying assumption (it averages to 83%).