Difference between revisions of "Sporleder&Li,EACL09"

Latest revision as of 09:43, 4 October 2012

Citation

title = {Unsupervised recognition of literal and non-literal use of idiomatic expressions},
author = {Sporleder, Caroline and Li, Linlin},
booktitle = {Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics},
series = {EACL '09},
year = {2009},
location = {Athens, Greece},
pages = {754--762},

Abstract from the paper

We propose an unsupervised method for distinguishing literal and non-literal usages of idiomatic expressions. Our method determines how well a literal interpretation is linked to the overall cohesive structure of the discourse. If strong links can be found, the expression is classified as literal, otherwise as idiomatic. We show that this method can help to tell apart literal and non-literal usages, even for idioms which occur in canonical form.

Online version

pdf link to the paper

Summary of approach

The main goal of this article is to distinguish between literal and non-literal usages of idiomatic expressions. For example, given the expressions ‘break the ice’ and ‘spill the beans’, the algorithm should annotate the sentence ‘Somehow I always end up spilling the beans all over the ﬂoor and looking foolish when the clerk comes to sweep them up.’ as literal, and ‘Dad had to break the ice on the chicken troughs so that they could get water’ as idiomatic

This method is based on the insight that ﬁgurative language exhibits less semantic cohesive ties with the context than literal language and in that idioms behave similarly to spelling errors. The approach, therefore, is similar to Hirst and St-Onge’s (1998) method for detecting malapropisms. The main idea is that if an expression is used literally, but not idiomatically, its component words will be related semantically to several words in the surrounding discourse. For example, when the expression ‘play with ﬁre’ is used literally, words such as ‘smoke, ‘burn’, ‘ﬁre department’, and ‘alarm’ tend to also be used nearby; when it is used idiomatically, they aren’t.

Authors implement two classifiers of the semantic relatedness of an expression’s component words to nearby words in the text: the first one computes the lexical chains for the input text and classiﬁes an expression as literal or non-literal depending on whether its component words participate in any of the chains, the second classiﬁer builds a cohesion graph and determines how this graph changes when the expression is inserted or left out. If one or more of the expression’s components sufﬁciently related to enough nearby words, forming a ‘lexical chain’, the usage is classiﬁed as literal. Otherwise it is idiomatic.

As a measure of semantic relatedness the Normalized Google Distance is used, which computes relatedness on the basis of the page counts returned by a search engine.

Experiments and results

The model was evaluated the idiom set consisting of 3964 idiom occurrences (17 idiom types) which were manually labeled as ’literal’ or ’ﬁgurative’.

Two classiﬁers based on lexical chains were compared with a supervised method that trains a classiﬁer for each expression based on surrounding context. The results showed that the supervised classiﬁer method did much better (90% F-score on literal uses) than the lexical chain classiﬁer methods (60% F-score)

Related Papers

Linlin Li and Caroline Sporleder. "Linguistic Cues for Distinguishing Literal and Non-Literal Usage", Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), August, 23-27, 2010, Beijing, China. pdf

@@ Line 20: / Line 20: @@
 == Summary of approach ==
-* The main goal of this [[Category::Paper|article]] is to distinguish between literal and non-literal  [[AddressesProblem::usages of idiomatic expressions]]. For example, given the expressions ''‘break the ice’'' and ''‘spill the beans’'', the algorithm should annotate the sentence ''‘Somehow I always end up spilling the beans all over the ﬂoor and looking foolish when the clerk comes to sweep them up.’'' as literal, and ''‘Dad had to break the ice on the chicken troughs so that they could get water’'' as idiomatic
+* The main goal of this [[Category::Paper|article]] is to distinguish between literal and non-literal usages of idiomatic expressions. For example, given the expressions ''‘break the ice’'' and ''‘spill the beans’'', the algorithm should annotate the sentence ''‘Somehow I always end up spilling the beans all over the ﬂoor and looking foolish when the clerk comes to sweep them up.’'' as literal, and ''‘Dad had to break the ice on the chicken troughs so that they could get water’'' as idiomatic
 * This method is based on the insight that ﬁgurative language exhibits less semantic cohesive ties with the context than literal language and in that idioms behave similarly to spelling errors. The approach, therefore, is similar to [http://www.cs.swarthmore.edu/~richardw/cs65-f08/litreview/meggie-malcolm.pdf Hirst and St-Onge’s (1998)] method for detecting [http://en.wikipedia.org/wiki/Malapropism malapropisms]. The main idea is that if an expression is used literally, but not idiomatically, its component words will be related semantically to several words in the surrounding discourse. For example, when the expression ‘play with ﬁre’ is used literally, words such as ‘smoke, ‘burn’, ‘ﬁre department’, and ‘alarm’ tend to also be used nearby; when it is used idiomatically, they aren’t.
@@ Line 27: / Line 27: @@
 * As a measure of semantic relatedness the [http://en.wikipedia.org/wiki/Normalized_Google_distance Normalized Google Distance] is used, which computes relatedness on the basis of the page counts returned by a search engine.
 == Experiments and results ==

Difference between revisions of "Sporleder&Li,EACL09"

Latest revision as of 09:43, 4 October 2012

Contents

Citation

Abstract from the paper

Online version

Summary of approach

Experiments and results

Related Papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools