Difference between revisions of "Multilingual Sentiment Analysis in Microblogs"

Revision as of 16:28, 8 October 2012

Team members

Wang Ling

Project Summary

Most of the work done on Microblogs (e.g. Twitter) has focused on processing English language messages. However, it has been stated in [1] that only approximately 40% of Twitter messages are posted in English. Ignoring these messages, might have negative effects on the results of the analysis experiment regarding a given topic. For instance, the analysis of customer satisfaction on a product based on only English messages, might be disregarding issues such as support for non-native customers.

In this project, we analyse the user sentiment during the 2012 Olympic game period from 2 sources Twitter and Sina Weibo. The goal is to analyse, for multitude of topics, whether the aggregate sentiment over the olympic games period in Twitter correlates with the ones in Weibo. In case, there is a strong divergence between the aggregate sentiments over a perior, we will find which are the reasons that lead to that divergence.

Dataset

A daily Twitter dataset of 1M sentences (each day) is available internally to CMU students.

To obtain the Weibo corpora, we will use the search API provided by Weibo to crawl the messages in the specified period.

Sentiment

Difference between revisions of "Multilingual Sentiment Analysis in Microblogs"

Revision as of 16:28, 8 October 2012

Team members

Project Summary

Dataset

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

@@ Line 3: / Line 3: @@
 == Project Summary ==
 Most of the work done on Microblogs (e.g. Twitter) has focused on processing English language messages. However, it has been stated in [http://www.mediabistro.com/alltwitter/twitter-language-share_b16109] that only approximately 40% of Twitter messages are posted in English. Ignoring these messages, might have negative effects on the results of the analysis experiment regarding a given topic. For instance, the analysis of customer satisfaction on a product based on only English messages, might be disregarding issues such as support for non-native customers.
+In this project, we analyse the user sentiment during the 2012 Olympic game period from 2 sources Twitter and Sina Weibo. The goal is to analyse, for multitude of topics, whether the aggregate sentiment over the olympic games period in Twitter correlates with the ones in Weibo. In case, there is a strong divergence between the aggregate sentiments over a perior, we will find which are the reasons that lead to that divergence.
+== Dataset ==
+A daily Twitter dataset of 1M sentences (each day) is available internally to CMU students.
+To obtain the Weibo corpora, we will use the search API provided by Weibo to crawl the messages in the specified period.
+Sentiment