Metaphor Detection in Different Topics
Comments
Looks like a nice idea for a project - it will be interesting to see how this turns out.
One worry I have is evaluation - is there any plan to a quantitative evaluation? One idea might be to consider some prediction task (I don't know - comment volume? comment sentiment? there is some existing data on that) and see if the metaphor usage is a useful predictor. --Wcohen 15:35, 10 October 2012 (UTC)
Team Members
Project Abstract
There is a rising interest towards metaphor detection. Specifically, detecting the violation of "selectional preference" (mostly of verbs) is the most well-known approach. The idea of "selectional preference" is that verbs have semantic preferences of their arguments. For instance, the verb "flex" has a strong preference of "muscle" and "bone" as its object. If we find that in some text, the object of "flex" is not of the semantic class of "muscle" and "bone", and it's very likely to be a metaphor.
The big idea of our project is that to observe the selectional preference of the same verbs among different topics. For instance, in the topic of sport, the subjects of "flex" are mostly humans; but in the topic of finance or politics, the subjects of "flex" are mostly organizations or countries, e.g., "China to flex its financial muscles at US meeting." We're interested in this difference, and aim to observe how could the metaphor detection technique be affected.
Data
We aim to find some more "vivid" metaphors, so we plan to use blog corpora rather than newspaper corpora. The two possible options are as follows.
- The Blog Authorship Corpus
The Blog Authorship Corpus consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. The corpus incorporates a total of 681,288 posts and over 140 million words - or approximately 35 posts and 7250 words per person.
Considering the size of corpus, we might start from The Blog Authorship Corpus first.
Techniques
- Violation Detection of Selectional Preference
There are some resources can be used to detect selectional preference violation. One of them is to use the VerbNet. VerbNet has some information about the constraint of arguments of verbs. By matching the text with verb and its argument, we're able to detect the violation of arguments.
- Topic Modeling
We want to use LDA to model the topics of blog post. By topic modeling, we want to observe the changes of selectional preferences among various topics.
- Word Clustering
In some literature of metaphor detection like (Shutova et al., 2010), due to the data sparsity, they first build the semantic clusters of nouns and verbs, and then analyze the selection preference of "verb clusters" (rather than "verbs") toward "noun clusters" (rather than "nouns"). This approach seems quite reasonable for us, so we might also adopt this method.
Related Work
- Birte Loenneker-Rodman and Srini Narayanan (2012). Computational Models of Figurative Language, Cambridge Encyclopedia of Psycholinguistics (2012). Spivey, M., Joannisse, M., McRae, K. (eds.), Cambridge Univeristy Press, Cambridge. http://www1.icsi.berkeley.edu/~snarayan/CompFig.pdf
- Ekaterina Shutova, Lin Sun, and Anna Korhonen (2010). Metaphor identification using verb and noun clustering. COLING 2010. http://dl.acm.org/citation.cfm?id=1873894
- Ekaterina Shutova. (2010). Models of metaphor in NLP. ACL 2010. http://dl.acm.org/citation.cfm?id=1858752