Difference between revisions of "Metaphor Detection in Different Topics"

From Cohen Courses
Jump to navigationJump to search
Line 18: Line 18:
 
* [http://www.ark.cs.cmu.edu/blog-data/ Political Blog Corpora]
 
* [http://www.ark.cs.cmu.edu/blog-data/ Political Blog Corpora]
  
Considering the size of corpus, we might start from The Blog Authorship Corpus.  
+
Considering the size of corpus, we might start from The Blog Authorship Corpus first.
  
 
== Techniques ==
 
== Techniques ==

Revision as of 23:29, 8 October 2012

Team Members

Project Abstract

There is a rising interest towards metaphor detection. Specifically, detecting the violation of "selectional preference" (mostly of verbs) is the most well-known approach. The idea of "selectional preference" is that verbs have semantic preferences of their arguments. For instance, the verb "flex" has a strong preference of "muscle" and "bone" as its object. If we find that in some text, the object of "flex" is not of the semantic class of "muscle" and "bone", and it's very likely to be a metaphor.

The big idea of our project is that to observe the selectional preference of the same verbs among different topics. For instance, in the topic of sport, the subjects of "flex" are mostly humans; but in the topic of finance or politics, the subjects of "flex" are mostly organizations or countries, e.g., "China to flex its financial muscles at US meeting." We're interested in this difference, and aim to observe how could the metaphor detection technique be affected.

Data

We aim to find some more "vivid" metaphors, so we plan to use blog corpora rather than newspaper corpora. The two possible options are as follows.

  • The Blog Authorship Corpus
    The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.

Considering the size of corpus, we might start from The Blog Authorship Corpus first.

Techniques

  • Violation Detection of Selectional Preference


  • Topic Modeling

Related Work