Compare Ku Akcora

From Cohen Courses
Revision as of 19:41, 26 October 2012 by Zhouyu (talk | contribs) (Created page with ' ==Two Papers== http://malt.ml.cmu.edu/mw/index.php/Akcora_et_al,_SOMA_2010 http://malt.ml.cmu.edu/mw/index.php/L._Ku,_Y._Liang,_and_H._Chen._Opinion_extraction,_summarization_…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Two Papers

http://malt.ml.cmu.edu/mw/index.php/Akcora_et_al,_SOMA_2010

http://malt.ml.cmu.edu/mw/index.php/L._Ku,_Y._Liang,_and_H._Chen._Opinion_extraction,_summarization_and_tracking_in_news_and_blog_corpora._In_Proceedings_of_AAAI-2006#Opinion_Tracking

Problem

Both of the paper addressed the problem of detecting opinion from text data. They both covered opinion tracking over a time period. The first paper focused on identity the breakpoints of the opinion changing while the second paper is more about getting the opinion summarized to in polarities, whether it is negative or positive.

Algorithm

The algorithm used in these two papers are totally different, since the goal is different.The first paper used two types of measure to detect the changing of the topic opinion, one is vector space model and the other is set space model. The second paper used a weighting scheme to determine the polarity of the word in Chinese, which is quite complicated compared other language like English. Because word are composed of characters which can have different meanings separately. However, what's worth mention is that both of the utilized the popular TF_IDF scheme in detecting key words. In the first paper, TF_IDF was modified to accumulate along time. In the second paper,TF_IDF is used as the basic component as identify if a sentence contain the key word that related to the topic, which then can be evaluated to see whether to added it to the fusion of the sentiment accumulation of the document or not.

Data set

The first papers used the tweets collected according Tiger Woods, November 27,2009 car accident. And the second paper in the opinion tracking part used the NTCIR corpus talked about 2000 Taiwan president election. Both of them have very clear sentiment difference.


Comments

Both of the paper are trying to extract opinion from the text data. Second one restricts in binary options(positive and negative), while the second one enables multiple dimensions of the opinion. The concern from my personal opinion is it is hard to evaluate the summarization or opinion breakpoints obtain from both the papers. All these concepts are relatively subject.