Difference between revisions of "Towards fine grained extended targets in sentiment analysis"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Team members == * Lingpeng Kong == Project Title == Sentiment Analysis in Multiple Domains == Project Abstract == Analyzing sentiment in text has emerged…')
 
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
== Comments ==
 +
 +
We're discussed this briefly.  I like the basic idea, which I'm interpreting as taking existing sentiment targets, applying phrase-detection methods to extend them, and then looking for cases where the extended targets have different sentiment that the basic targets.
 +
 +
It would be fun to see this on twitter, but hard to evaluate.  You might look at testing the method quantitatively on a sentiment dataset that's been extensively labeled, like the [http://verbs.colorado.edu/jdpacorpus/ JD Powers dataset]. --[[User:Wcohen|Wcohen]] 15:46, 10 October 2012 (UTC)
 +
 
== Team members ==
 
== Team members ==
  
Line 5: Line 11:
 
== Project Title ==
 
== Project Title ==
  
Sentiment Analysis in Multiple Domains
+
Towards fine grained extended targets in sentiment analysis
 +
 
 +
== Introduction ==
 +
 
 +
A key motivation for doing sentiment analysis in social media is that many of the companies and individuals want to know what other people think about them. These target-dependent sentiment analysis tools has attracted much attention recently. Websites like [http://www.tweetfeel.com/ Tweetfeel] and [http://www.sentiment140.com/ TwitterSentiment] has been set up. The task for these tools is very simple. Namely, when you put a company name or a person's name whatever you are interested, they will give you some tweets containing your input name and classify them into positive, negative (or neutral).
  
== Project Abstract ==
+
For example:
Analyzing sentiment in text has emerged as a very interesting and challenging area of research in the past decade. Several techniques including simple rule-based approaches, unsupervised learning and a range of supervised learning techniques using various feature representations and constraints have been proposed in previous works. Sentiment analysis on different granularity is also extensively studied.
 
  
However, one of the major challenges that still needs to be addressed is that of domain adaptation across different kinds of text that sentiment analysis algorithms need to process. For example, in the context of product reviews on Amazon.com, a sentiment analysis model that was learned on book reviews does not perform as well on kitchen appliance reviews if applied directly [Blitzer et. al. 2007]. One of the reasons the model underperforms is that the kinds of features that indicate positive or negative sentiment in book reviews are not the same as the features that indicate positive or negative sentiment in the domain of kitchen appliances. For example, in the kitchen domain of Amazon review, we may see lots of people use “stainless” as a positive feedback for some products, thus it may get a high weight in the domain specific classifier. However, this word is less likely to appear in some other domains like books or dvd, so it can't benefit classifying reviews in other domains.
+
* @hilton_peggy it took me 2 hours to figure out [[Microsoft]] excel for my graph. I feel your pain.  
 +
* @_Jasmaniandevil i like watching [[Obama]] make a fool of himself ...
 +
*  Not going to [[CMU]] homecoming :(
  
In this project, we want to address the problem of how can we leverage labeled reviews from multiple source domains to better classify reviews in target domain.
+
You can see clearly from the examples that simply use a key-words-hit method in this task is not a very good way. Because these extended targets (Microsoft excel, Obama make a fool fo himself, Not going to CMU homecoming) is the real targets of the sentence, these extended targets themselves can make a difference in query name we input. Microsoft excel can be regarded as sentiment to Microsoft (with same polarity), Obama make a fool of himself can be regarded as sentiment to Obama (with opposite polarity) and CMU homecoming may have no sentiment transfered to CMU. Therefore, ways to use fine grained extended targets in this task is extremely important.
  
 
== Task ==
 
== Task ==
  
Given the labeled reviews of some product types, which is regarded as source domains and unlabeled reviews from another product type, which is regarded as target domain, we want to classify reviews from target domain into positive or negative class.
+
We will trying to find patterns between extended targets and the input words (namely, orignal targets) on how we can extend a target, how these extensions affect the orignal targets in sentiment analysis tasks, and using these to improve the current state-of-art target-dependent sentiment classifier.
  
 
== Data ==
 
== Data ==
  
We will use the benchmark dataset of [[UsesData::Amazon|Amazon review]] collected by Blitzer et al. (2007). This dataset gathered more than 340,000 reviews from 22 different product types, which can be regarded as different domains.  
+
Want to use data presented in Long Jiang et al, Target-dependent Twitter Sentiment Classification, ACL 2011. Twitter Dataset and website like Tweetfeel and TwitterSentiment are useful resources.
Moreover, we do not label data manually, instead we use the star information as proposed in the original work.
 
 
 
== Techniques ==
 
  
Firstly, the most naïve approach for this task is simply merging all examples in the multiple source product types, and leverage some single source domain adaptation algorithm like [1], [3] to classify target domain reviews.
+
== Draft Plan/Method ==
  
As we assume that target domain unlabeled data is available, the second technique could follow a bootstrapping way of automatically adding target domain unlabeled data like proposed in [5].
+
* Find all the connections syntactically to see in which connection, context words can impact the orignal target.
 +
* Discuss patterns we can find in extended targets, for example, if it is a part-of relation or the orignal target is only a modifier etc.
 +
* Try to find the group of words which have negative effect to the orignal target (things like "make fool of oneself") since that is an orignal of errors.  
  
 
== Related Work ==
 
== Related Work ==
[1] John Blitzer, Mark Dredze, Fernando Pereira, Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification. Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 432-439, 2007.
 
 
[2] Hai Daume´ III. Frustratingly Easy Domain Adaptation. Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 256-263, June 2007.
 
 
[3] John Blitzer , Ryan McDonald , Fernando Pereira, Domain adaptation with structural correspondence learning, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, July 22-23, 2006, Sydney, Australia.
 
  
[4] Jing Jiang, Chengxiang Zhai. Instance Weighting for Domain Adaptation in NLP. Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 264-271, June 2007.
+
Long Jiang et al, Target-dependent Twitter Sentiment Classification, ACL 2011.
  
[5] Dan Wu, Wee Sun Lee, Nan Ye, Hai Leong Chieu, Domain adaptive bootstrapping for named entity recognition. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3, August 06-07, 2009, Singapore
+
Luciano Barbosa and Junlan Feng, Robust Sentiment Detection on Twitter from Biased and Noisy Data, Coling 2010

Latest revision as of 11:46, 10 October 2012

Comments

We're discussed this briefly. I like the basic idea, which I'm interpreting as taking existing sentiment targets, applying phrase-detection methods to extend them, and then looking for cases where the extended targets have different sentiment that the basic targets.

It would be fun to see this on twitter, but hard to evaluate. You might look at testing the method quantitatively on a sentiment dataset that's been extensively labeled, like the JD Powers dataset. --Wcohen 15:46, 10 October 2012 (UTC)

Team members

Project Title

Towards fine grained extended targets in sentiment analysis

Introduction

A key motivation for doing sentiment analysis in social media is that many of the companies and individuals want to know what other people think about them. These target-dependent sentiment analysis tools has attracted much attention recently. Websites like Tweetfeel and TwitterSentiment has been set up. The task for these tools is very simple. Namely, when you put a company name or a person's name whatever you are interested, they will give you some tweets containing your input name and classify them into positive, negative (or neutral).

For example:

  • @hilton_peggy it took me 2 hours to figure out Microsoft excel for my graph. I feel your pain.
  • @_Jasmaniandevil i like watching Obama make a fool of himself ...
  • Not going to CMU homecoming :(

You can see clearly from the examples that simply use a key-words-hit method in this task is not a very good way. Because these extended targets (Microsoft excel, Obama make a fool fo himself, Not going to CMU homecoming) is the real targets of the sentence, these extended targets themselves can make a difference in query name we input. Microsoft excel can be regarded as sentiment to Microsoft (with same polarity), Obama make a fool of himself can be regarded as sentiment to Obama (with opposite polarity) and CMU homecoming may have no sentiment transfered to CMU. Therefore, ways to use fine grained extended targets in this task is extremely important.

Task

We will trying to find patterns between extended targets and the input words (namely, orignal targets) on how we can extend a target, how these extensions affect the orignal targets in sentiment analysis tasks, and using these to improve the current state-of-art target-dependent sentiment classifier.

Data

Want to use data presented in Long Jiang et al, Target-dependent Twitter Sentiment Classification, ACL 2011. Twitter Dataset and website like Tweetfeel and TwitterSentiment are useful resources.

Draft Plan/Method

  • Find all the connections syntactically to see in which connection, context words can impact the orignal target.
  • Discuss patterns we can find in extended targets, for example, if it is a part-of relation or the orignal target is only a modifier etc.
  • Try to find the group of words which have negative effect to the orignal target (things like "make fool of oneself") since that is an orignal of errors.

Related Work

Long Jiang et al, Target-dependent Twitter Sentiment Classification, ACL 2011.

Luciano Barbosa and Junlan Feng, Robust Sentiment Detection on Twitter from Biased and Noisy Data, Coling 2010