Difference between revisions of "Cross-Lingual Mixture Model for Sentiment Classification, Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Ge Xu, Houfeng Wang, ACL 2012"

From Cohen Courses
Jump to navigationJump to search
Line 1: Line 1:
 
== Citation ==
 
== Citation ==
Revisiting the Predictability of Language: Response Completion in Social Media, Bo Pang Sujith Ravi, EMNLP 2012
+
Cross-Lingual Mixture Model for Sentiment Classification, Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Ge Xu, Houfeng Wang, ACL 2012
  
 
== Online version ==
 
== Online version ==
  
An online pdf version is here[http://aclweb.org/anthology-new/D/D12/D12-1136.pdf]
+
An online pdf version is here[http://www.aclweb.org/anthology-new/P/P12/P12-1060.pdf]
  
 
== Summary ==
 
== Summary ==
  
This paper proposed a method for automatic response completion in Social Media context by considering mainly two factors:
 
  
  
1) The language used in responses (By using Language Model[LM] (bigram model & trigram model(both back-off to unigram)))
+
== Evaluation ==
 
 
 
 
2) The specific context provided by the original message.
 
  
The author used the following things to model the part.
+
The author evaluate CLMM's performance using [[http://malt.ml.cmu.edu/mw/index.php/Dataset:MPQA MPQA]] and [[http://malt.ml.cmu.edu/mw/index.php/NTCIR-6_Opinion NTCIR]] in mainly two cases:
  
[TM] Methods In Ritter et. al 2010, Data-Driven Response Generation in Social Media, which is to use a translation model to do alignment between stimulus(source) and the response(target). [IBM-Model1]
+
1) Keep the labeled data in target language (Chinese) unavailable.
  
[Selection model] To select a token in stimulus uniformly at random.
+
A
  
[Topic model] First learn a topic model over conversations in the training data using LDA. Then identify the most likely topic of the conversation based on s, and expect responds to be generated from this topic.
+
2) Using the labeled target language (Chinese) data.
 
 
 
 
The author used a '''linear''' combination to  mixture these two factors (models).
 
 
 
== Evaluation ==
 
  
The author claims that translation-based approach is not well suited for this particular task and LDA suffers from the fact that the text is noisy (or too generic) therefore, not useful enough to help in the prediction task.
+
B
  
 
== Discussion ==
 
== Discussion ==
Line 51: Line 42:
  
 
== Data Set==
 
== Data Set==
[[http://malt.ml.cmu.edu/mw/index.php/Dataset:MPQA MPQA]][[http://malt.ml.cmu.edu/mw/index.php/NTCIR-6_Opinion NTCIR]]
 

Revision as of 23:28, 30 September 2012

Citation

Cross-Lingual Mixture Model for Sentiment Classification, Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Ge Xu, Houfeng Wang, ACL 2012

Online version

An online pdf version is here[1]

Summary

Evaluation

The author evaluate CLMM's performance using [MPQA] and [NTCIR] in mainly two cases:

1) Keep the labeled data in target language (Chinese) unavailable.

A

2) Using the labeled target language (Chinese) data.

B

Discussion

The author provides an analysis (entropy estimates along with upper-bound numbers observed from experiments) and suggests that there can be interesting future work to explore the contextual information provided by the stimulus more effectively and further improve the response completion task.

Related papers

Ritter et. al 2010, Data-Driven Response Generation in Social Media

Regina Barzilay and Mirella Lapata. 2005, Modeling local coherence: An entity-based approach

Study plan

Language Model: [2]

Machine Translation, IBM Model-1 [3]

LDA [4]


Data Set