Difference between revisions of "Cross-Lingual Mixture Model for Sentiment Classification, Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Ge Xu, Houfeng Wang, ACL 2012"
Line 8: | Line 8: | ||
== Summary == | == Summary == | ||
+ | This paper propose a cross-lingual mixture model (CLMM) to tackle the problem of cross-lingual sentiment classification. The motivation for this work is the lack of labeled data in target language (therefore, we want to bring labeled data in source language to help). | ||
+ | Having a labeled source data <math>D_s</math>, a parallel corpus <math>U</math> and an optional labeled target data <math>D_t</math>, they maximize the log-likelihood function for the parallel corpus. | ||
+ | |||
+ | [[File:P12-1060.png]] | ||
+ | |||
+ | This is to say that a word in the parallel corpus is generative by 1) directly generate a Chinese word according to the polarity of the sentence OR 2) first generate an English word with the same polarity and meaning, and then translate it to a Chinese word. | ||
+ | |||
+ | At the same time, they want to maximize the log-likelihood function for the source data (and the target data, optional). | ||
+ | |||
+ | The words projection probability is given by the Berkeley aligner. The words generation probability given sentimental class is estimated using EM. | ||
+ | |||
+ | Finally, the words generation probability can be used in Naive Bayes classifier. | ||
== Evaluation == | == Evaluation == | ||
Line 16: | Line 28: | ||
1) Keep the labeled data in target language (Chinese) unavailable. | 1) Keep the labeled data in target language (Chinese) unavailable. | ||
− | + | Can greatly improve the performance (71%) comparing with MT-SVM(52%-62%) and MT-Cotrain(59%-65%). | |
2) Using the labeled target language (Chinese) data. | 2) Using the labeled target language (Chinese) data. | ||
− | + | Still beat the baseline SVM (using the labeled data in Chinese to train the model), and can compete other state-of-art methods like the Joint-Train & MT-Cotrain (Wan, 2009), while need less time in training. | |
− | |||
− | |||
− | |||
− | |||
== Related papers == | == Related papers == | ||
− | + | Xiaojun Wan. 2009. Co-training for cross-lingual senti- ment classification. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Bin Lu, Chenhao Tan, Claire Cardie, and Benjamin K. Tsou. 2011. Joint bilingual sentiment classification with unlabeled parallel corpora. |
Revision as of 23:01, 30 September 2012
Citation
Cross-Lingual Mixture Model for Sentiment Classification, Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Ge Xu, Houfeng Wang, ACL 2012
Online version
An online pdf version is here[1]
Summary
This paper propose a cross-lingual mixture model (CLMM) to tackle the problem of cross-lingual sentiment classification. The motivation for this work is the lack of labeled data in target language (therefore, we want to bring labeled data in source language to help).
Having a labeled source data , a parallel corpus and an optional labeled target data , they maximize the log-likelihood function for the parallel corpus.
This is to say that a word in the parallel corpus is generative by 1) directly generate a Chinese word according to the polarity of the sentence OR 2) first generate an English word with the same polarity and meaning, and then translate it to a Chinese word.
At the same time, they want to maximize the log-likelihood function for the source data (and the target data, optional).
The words projection probability is given by the Berkeley aligner. The words generation probability given sentimental class is estimated using EM.
Finally, the words generation probability can be used in Naive Bayes classifier.
Evaluation
The author evaluate CLMM's performance using [MPQA] and [NTCIR] in mainly two cases:
1) Keep the labeled data in target language (Chinese) unavailable.
Can greatly improve the performance (71%) comparing with MT-SVM(52%-62%) and MT-Cotrain(59%-65%).
2) Using the labeled target language (Chinese) data.
Still beat the baseline SVM (using the labeled data in Chinese to train the model), and can compete other state-of-art methods like the Joint-Train & MT-Cotrain (Wan, 2009), while need less time in training.
Related papers
Xiaojun Wan. 2009. Co-training for cross-lingual senti- ment classification.
Bin Lu, Chenhao Tan, Claire Cardie, and Benjamin K. Tsou. 2011. Joint bilingual sentiment classification with unlabeled parallel corpora.