Difference between revisions of "A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006"

From Cohen Courses
Jump to navigationJump to search
 
(17 intermediate revisions by the same user not shown)
Line 8: Line 8:
 
* TroFi (TropeFinder) System
 
* TroFi (TropeFinder) System
 
# '''Task:''' Classifying literal and nonliteral usages of verbs
 
# '''Task:''' Classifying literal and nonliteral usages of verbs
# '''Approach:''' Use '''nearly unsupervised word-sense disambiguation''' and * '''clustering techniques'''
+
# '''Method:''' Use '''nearly unsupervised word-sense disambiguation''' and '''clustering techniques'''
 
* Processing Steps
 
* Processing Steps
# KE Algorithm: Similarity-based word-sense disambiguation algorithm
+
# '''KE Algorithm''': Similarity-based word-sense disambiguation algorithm
#* Similarities are calculated between sentences containing the word we wish to disambiguate (the target word) and collections of seed sentences (feedback sets)
+
#* Similarities are calculated between:
# Clean the Feedback Sets
+
#*# Sentences containing the word we wish to disambiguate (the '''target word''')
# Learning & Voting
+
#*# Collections of seed sentences ('''feedback sets''')
 +
# '''Clean the Feedback Sets'''
 +
#* In order to remove false attraction
 +
#* 4 Principle of '''Scrubbing'''
 +
#*# Human annotations (in DoKMIE) are reliable
 +
#*# Phrasal and expression verbs are often indicative of nonliteral uses
 +
#*# Content words appearing in both feedback sets should be avoided
 +
#*# '''Learning & voting:''' Use four learners (A, B, C, D) to vote the best form of scrubbing action
  
 +
== Result ==
 +
# TroFi achieved '''F1-score of 0.538''', and outperforms the baseline by '''24.4%''' (on human-labeled data)
 +
# Build the '''TroFi Example Base''', which is a freely available metaphor annotated resource.
  
 +
== Discussion and Thought ==
 +
# This work explore a approach of metaphor identification which is relatively less mentioned. Compared with selection restriction modeling or lexicon-based methods, this method requires less human involvements, and adopt the well-development technologies borrowed from word sense disambiguation.
 +
# [[Models_of_metaphor_in_NLP]] criticized that this work doesn't define their task clearly. Part of the reason is that they simplify the task (a little bit) to fit some norm of word sense disambiguation. But in general, it's still a very inspiring work.
  
* Nearly Unsupervised Word-Sense Disambiguation
+
== Study Plan ==
 
+
Papers you may want to read:
 
+
# The '''core algorithm''' is based on this paper: Yael Karov and Shimon Edelman. 1998. Similarity-based word sense disambiguation. Comput. Linguist. 24, 1 (Mar. 1998), 41-59.
* Clustering
+
# One important feature, '''SuperTag''', is based on this paper: Srinivas Bangalore and Aravind K. Joshi. 1999. Supertagging: an approach to almost parsing. Comput. Linguist. 25, 2 (Jun. 1999), 237-265.
 
 
 
 
## Use sentential context instead of selectional constraint violations or paths in semantic hierarchies
 
 
 
It also uses literal and nonliteral seed sets acquired and cleaned without human supervision in order to bootstrap learning.  
 
 
 
We adapt a word-sense disambiguation algorithm to our task and augment it with multiple seed set learners, a voting schema, and additional features like SuperTags and extrasentential context.  
 
 
 
Detailed experiments on hand-annotated data show that our enhanced algorithm outperforms the baseline by 24.4%.  
 
 
 
Using the TroFi algorithm, we also build the TroFi Example Base, an extensible resource of annotated literal/nonliteral examples which is freely available to the NLP research community.
 
 
 
 
 
 
 
 
 
 
 
# How much time did you spend reading the old wikified paper?
 
# How much time did you spend reading the summary of the old paper?
 
# How much time did you spend reading background materiel?
 
# Was there a study plan for the old paper?
 
## if so, did you read any of the items suggested by
 
##    the study plan? and how much time did you spend with reading them?
 
# Give us any additional feedback you might have about this assignment.
 
 
 
== Discussion ==
 

Latest revision as of 11:08, 8 November 2012

Citation

Birke, J. and A. Sarkar. 2006. A clustering approach for the nearly unsupervised recognition of nonliteral language. In Proceedings of EACL-06, pages 329–336.

Online Version

pdf link to the paper

Method Summary

  • TroFi (TropeFinder) System
  1. Task: Classifying literal and nonliteral usages of verbs
  2. Method: Use nearly unsupervised word-sense disambiguation and clustering techniques
  • Processing Steps
  1. KE Algorithm: Similarity-based word-sense disambiguation algorithm
    • Similarities are calculated between:
      1. Sentences containing the word we wish to disambiguate (the target word)
      2. Collections of seed sentences (feedback sets)
  2. Clean the Feedback Sets
    • In order to remove false attraction
    • 4 Principle of Scrubbing
      1. Human annotations (in DoKMIE) are reliable
      2. Phrasal and expression verbs are often indicative of nonliteral uses
      3. Content words appearing in both feedback sets should be avoided
      4. Learning & voting: Use four learners (A, B, C, D) to vote the best form of scrubbing action

Result

  1. TroFi achieved F1-score of 0.538, and outperforms the baseline by 24.4% (on human-labeled data)
  2. Build the TroFi Example Base, which is a freely available metaphor annotated resource.

Discussion and Thought

  1. This work explore a approach of metaphor identification which is relatively less mentioned. Compared with selection restriction modeling or lexicon-based methods, this method requires less human involvements, and adopt the well-development technologies borrowed from word sense disambiguation.
  2. Models_of_metaphor_in_NLP criticized that this work doesn't define their task clearly. Part of the reason is that they simplify the task (a little bit) to fit some norm of word sense disambiguation. But in general, it's still a very inspiring work.

Study Plan

Papers you may want to read:

  1. The core algorithm is based on this paper: Yael Karov and Shimon Edelman. 1998. Similarity-based word sense disambiguation. Comput. Linguist. 24, 1 (Mar. 1998), 41-59.
  2. One important feature, SuperTag, is based on this paper: Srinivas Bangalore and Aravind K. Joshi. 1999. Supertagging: an approach to almost parsing. Comput. Linguist. 25, 2 (Jun. 1999), 237-265.