Difference between revisions of "DmitryDavidov et al. CoNLL"

Revision as of 19:31, 30 September 2012

Citation

Semi-supervised recognition of sarcastic sentences in twitter and amazon,

Dmitry Davidov, Oren Tsur and Ari Rappoport, CoNLL 2010

Online version

Semi-supervised recognition of sarcastic sentences in twitter and amazon

Summary

This paper address the Sentiment analysis problem on sentence level for multiple languages. They propose to leverage parallel corpora to learn a MaxEnt-based EM model that consider both languages simultaneously under the assumption that sentiment labels for parallel sentences should be similar.

The experimented on 2 dataset: Twitter Dataset and Amazon Dataset

Evaluation

In this paper, it proposed several feature extraction methods and a data enrichment method. In the evaluation part, it mainly compared the performance between those methods. Moreover, the authors used two settings to test the robustness, one is traditional in-domain cross validation and the other is cross domain test. It reported promising results on both settings.

Discussion

This paper addresses the problem of bilingual sentiment classification. It leverages some parallel corpus, or pseudo-parallel corpus which is generated from automatic translation software like Google Translate, to build a MaxEnt model that maximize the joint probability p(y1, y2|x1, x2; w1, w2) under the assumption that the same idea expressed by different languages should have similar polarity.

The strong points of the paper includes:

 1. It maximizes the joint probability so that the model can consider different languages simultaneously and will not biased to one language. 
    Moreover, it takes the translation quality into consideration so that it will not be severely damaged by poor translation quality and can leverage some pseudo-parallel corpus.
 2. It takes EM algorithm to leverage more unlabeled parallel data, which is much more earlier to get.

The weak point of the paper includes:

 1. The baseline algorithms is too weak. It mostly compares their algorithm with some algorithm that didn't take special consideration about this configuration, so it's not surprising        
    that the proposed algorithm can out-perform the baselines.
 2. There is a limitation caused by translation. Current translation algorithms can barely give meaningful translation for documents, and parallel corpus on document level is also rare.
    This make this algorithm hard to go above the sentence level.

Related papers

Paper:Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in product reviews:[1]
Paper:Efficient unsupervised discovery o word categories using symmetric patterns and high frequency words:[2]
Paper:Automatic satire detection: Are you having a laugh?:[3]

Study plan

As a typical incremental work, the original works includes:

Paper:Icwsm - a great catchy name: Semi-supervised recognition of sarcastic sentences in product reviews:[4]
Paper:Efficient unsupervised discovery o word categories using symmetric patterns and high frequency words:[5]

And the classification algorithm used:

Article: k-Nearest Neighbor:k-Nearest Neighbor

Difference between revisions of "DmitryDavidov et al. CoNLL"

Revision as of 19:31, 30 September 2012

Contents

Citation

Online version

Summary

Evaluation

Discussion

Related papers

Study plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 18: / Line 18: @@
 In this paper, it proposed several feature extraction methods and a data enrichment method. In the evaluation part, it mainly compared the performance between those methods.
+Moreover, the authors used two settings to test the robustness, one is traditional in-domain cross validation and the other is cross domain test. It reported promising results on both settings.
 == Discussion ==