Morante et al., 2010

From Cohen Courses
Jump to navigationJump to search

Citation

Roser Morante, Vincent Van Asch, and Walter Daelemans. Memory-Based Resolution of In-Sentence Scopes of Hedge Cues. In Proceedings of the 2010 Conference on Computational Natural Language Learning. Online Link

Summary

The 2010 CoNLL Shared Task (results summarized in Farkas et al., 2010) had two components: a binary classification task on per-sentence uncertainty labeling, and a more structured problem: scope detection of hedge cues in a given uncertain sentence. The latter of these categories is more directly relevant to the Structured Prediction course. This paper was the most successful entrant to that second, structured task.

Motivation

A key insight from this paper is to treat Task 1 (uncertainty classification) as a prerequisite task for performing Task 2 (hedge scope detection). Thus, classification is done not at the sentence level but at the word level; this allows the first task to be performed based on a heuristic, given a collection of certain- or uncertain-tagged words, and gives a rich source of annotated data as input to the second task.

Task 1: Uncertain Sentence Classification

The first half of the CoNLL 2010 Shared Task was a binary classification problem, not a structured prediction problem, so we do not go into great detail here. The task is, given a sentence, determine whether that sentence is uncertain or not. Classification in this paper was done by predicting the uncertainty of each word; then, in postprocessing, a sentence was marked as uncertain if it contains 5% or more words that were classified individually as uncertain.

The learning step used a standard SVM with a polynomial kernel. Features represention for each sentence contained surface form, lemmatized stem, part of speech and dependency information, and similar features about the four words in the immediate context of the word (two ahead and two behind), and two features based on a vocabulary list of cues.

This system results in performance of F-score 57 (precision 81, recall 44) on detecting Wikipedia weasel sentences, and F-score 82 (precision 81, recall 82) at detecting hedged sentences in biomedical scientific literature.

Task 2: In-Sentence Hedge Cue Scope Detection

This task is much more complex. Given a sentence, two annotations must be performed: first, identifying the cues, specific words which indicate uncertainty in a text; and second, scope, the substrings within a sentence that are being hedged against. The data set for this corpus is the second data set from the first task - biomedical scientific literature. Hedging in this format is usually well-structured and explicit, compared to the more subtle use of weasel words in Wikipedia articles (as evidenced by the dramatically better performance in Task 1).

This paper treats scope detection as a pipeline with multiple components.

  • First, individual words are predicted as cues using the system from Task 1.
  • Next, pass word-by-word over the sentence again, classifying as FIRST, LAST, or neither. These represent whether a word will be the beginning or end of a scope.
  • Finally, using a rule-based algorithm, convert from a series of predicted firsts and lasts into a coherent span of text to mark as the scope.

The middle step is learned using a 7-nearest neighbor algorithm, with votes from each neighbor weighted by distance from the example. Features are weighted by gain ratio and similarity is computed by feature overlap. This is an application of the IB1 algorithm for instance-based learning, one of the simplest ways of performing this task.

A much larger feature space was used for this task compared to the classification task. Features included: numerous part of speech, dependency graph, and chunk based cues for both the cue token, the token being classified; designed features inspired by domain knowledge, such as marking whether a word is before, inside, or after the predicted cue; and features about the dependency graph of the sentence in a larger context, such as active/passive clause construction, features marking whether a word is heuristically "eligible" to be a FIRST or LAST word, and whether the cue and token being classified share a common ancestor. This is not an exhaustive list of features, as the authors appear to have taken a kitchen-sink approach to data representation.

The rules that they then use to "clean" the output of their system:

  • All words between FIRST and LAST tagged words are marked as part of a scope
  • If no LAST token is identified, mark as LAST the first word right of a FIRST-tagged word which is heuristically "eligible" to be tagged as such.
  • If multiple tokens are marked as LAST, choose the leftmost word after the FIRST-tagged word.
  • If no FIRST token is identified, mark all words between the predicted cue and the LAST-tagged word.
  • If no FIRST token is identified, and multiple LAST cues are, choose the leftmost LAST token after the predicted cue.
  • If more than one FIRST token is identified, the scope will be started at the cue word instead.
  • If no tokens are marked as FIRST or LAST, the scope will span from the predicted cue to the first word which is heuristically "eligible" to be tagged as LAST.

This approach was the most successful of any submission to the shared task, with an F-score of 57 (precision 59, recall 57). The authors also give their performance based only on the first two steps in the pipeline, which results in drastically worse performance (F1 46, precision 49, recall 44), suggesting that chaining a statistical, feature-based model with hard rules as a postprocessing step is an effective strategy.