Difference between revisions of "Multilingual Sentiment Analysis in Microblogs"

From Cohen Courses
Jump to navigationJump to search
Line 3: Line 3:
  
 
== Project Summary ==
 
== Project Summary ==
 +
 
Most of the work done on Microblogs (e.g. Twitter) has focused on processing English language messages. However, it has been stated in [http://www.mediabistro.com/alltwitter/twitter-language-share_b16109] that only approximately 40% of Twitter messages are posted in English. Ignoring these messages, might have negative effects on the results of the analysis experiment regarding a given topic. For instance, the analysis of customer satisfaction on a product based on only English messages, might be disregarding issues such as support for non-native customers.
 
Most of the work done on Microblogs (e.g. Twitter) has focused on processing English language messages. However, it has been stated in [http://www.mediabistro.com/alltwitter/twitter-language-share_b16109] that only approximately 40% of Twitter messages are posted in English. Ignoring these messages, might have negative effects on the results of the analysis experiment regarding a given topic. For instance, the analysis of customer satisfaction on a product based on only English messages, might be disregarding issues such as support for non-native customers.
 +
 +
In this project, we analyse the user sentiment during the 2012 Olympic game period from 2 sources Twitter and Sina Weibo. The goal is to analyse, for multitude of topics, whether the aggregate sentiment over the olympic games period in Twitter correlates with the ones in Weibo. In case, there is a strong divergence between the aggregate sentiments over a perior, we will find which are the reasons that lead to that divergence.
 +
 +
== Dataset ==
 +
 +
A daily Twitter dataset of 1M sentences (each day) is available internally to CMU students.
 +
 +
To obtain the Weibo corpora, we will use the search API provided by Weibo to crawl the messages in the specified period.
 +
 +
Sentiment

Revision as of 15:28, 8 October 2012

Team members

Project Summary

Most of the work done on Microblogs (e.g. Twitter) has focused on processing English language messages. However, it has been stated in [1] that only approximately 40% of Twitter messages are posted in English. Ignoring these messages, might have negative effects on the results of the analysis experiment regarding a given topic. For instance, the analysis of customer satisfaction on a product based on only English messages, might be disregarding issues such as support for non-native customers.

In this project, we analyse the user sentiment during the 2012 Olympic game period from 2 sources Twitter and Sina Weibo. The goal is to analyse, for multitude of topics, whether the aggregate sentiment over the olympic games period in Twitter correlates with the ones in Weibo. In case, there is a strong divergence between the aggregate sentiments over a perior, we will find which are the reasons that lead to that divergence.

Dataset

A daily Twitter dataset of 1M sentences (each day) is available internally to CMU students.

To obtain the Weibo corpora, we will use the search API provided by Weibo to crawl the messages in the specified period.

Sentiment