Difference between revisions of "Tsur et al ICWSM 10"
Line 16: | Line 16: | ||
In this work, the authors introduce a novel semi-supervised approach that is able to identify sarcasm in the comments of online reviews. In order to do that, they first define a small training set, labeled by hand, which contains some very obvious sarcastic comments and some clearly non-sarcastic ones. The sarcasm levels for each of those reviews range in a scale from 1-5. | In this work, the authors introduce a novel semi-supervised approach that is able to identify sarcasm in the comments of online reviews. In order to do that, they first define a small training set, labeled by hand, which contains some very obvious sarcastic comments and some clearly non-sarcastic ones. The sarcasm levels for each of those reviews range in a scale from 1-5. | ||
Using this train set, they extract two different types of features: | Using this train set, they extract two different types of features: | ||
− | * Pattern Based | + | * Pattern Based: For the pattern identification, the authors separated all terms into High Frequency Words (HFW) or Context Words (CW), simply by thresholding their corpus frequency (with HFW having higher such frequency than CW's). Consequently, they allow for each pattern to contain 2-6 HWF and 1-6 CW. As a next step, they filter out some patterns that are not particularly useful (in order to cut down their initially big number), by eliminating patterns that 1) appear only on a single product, 2) appear on the train set in reviews which are either clearly sarcastic (rated 5) or clearly non-sarcastic (rated 1). |
* Syntatctic | * Syntatctic | ||
Revision as of 17:28, 30 September 2012
This a Paper that appeared at the International AAAI Conference on Weblogs and Social Media 2010
Citation
title={ICWSM--A great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews}, author={Tsur, O. and Davidov, D. and Rappoport, A.}, booktitle={Proceedings of the fourth international AAAI conference on weblogs and social media}, pages={162--169}, year={2010}
Online version
Summary
In this work, the authors introduce a novel semi-supervised approach that is able to identify sarcasm in the comments of online reviews. In order to do that, they first define a small training set, labeled by hand, which contains some very obvious sarcastic comments and some clearly non-sarcastic ones. The sarcasm levels for each of those reviews range in a scale from 1-5. Using this train set, they extract two different types of features:
- Pattern Based: For the pattern identification, the authors separated all terms into High Frequency Words (HFW) or Context Words (CW), simply by thresholding their corpus frequency (with HFW having higher such frequency than CW's). Consequently, they allow for each pattern to contain 2-6 HWF and 1-6 CW. As a next step, they filter out some patterns that are not particularly useful (in order to cut down their initially big number), by eliminating patterns that 1) appear only on a single product, 2) appear on the train set in reviews which are either clearly sarcastic (rated 5) or clearly non-sarcastic (rated 1).
- Syntatctic
After the feature extraction process, in order to decide how sarcastic a new comment, drawn from a test dataset, is, they utilize a k-NN inspired classifier which works as follows:
Evaluation
Datasets:
Metrics:
Baselines:
Results: