Difference between revisions of "Towards fine grained extended targets in sentiment analysis"

Revision as of 00:05, 9 October 2012

Team members

Lingpeng Kong

Project Title

Towards fine grained extended targets in sentiment analysis

Introduction

A key motivation for doing sentiment analysis in social media is that many of the companies and individuals want to know what other people think about them. These target-dependent sentiment analysis tools has attracted much attention recently. Websites like Tweetfeel and TwitterSentiment has been set up. The task for these tools is very simple. Namely, when you put a company name or a person's name whatever you are interested, they will give you some tweets containing your input name and classify them into positive, negative (or neutral).

For example:

@hilton_peggy it took me 2 hours to figure out Microsoft excel for my graph. I feel your pain.
@_Jasmaniandevil i like watching Obama make a fool of himself ...
Not going to CMU homecoming :(

You can see clearly from the examples that simply use a key-words-hit method in this task is not a very good way. Because these extended targets (Microsoft excel, Obama make a fool fo himself, Not going to CMU homecoming) is the real targets of the sentence, these extended targets themselves can make a difference in query name we input. Microsoft excel can be regarded as sentiment to Microsoft (with same polarity), Obama make a fool of himself can be regarded as sentiment to Obama (with opposite polarity) and CMU homecoming may have no sentiment transfered to CMU. Therefore, ways to use fine grained extended targets in this task is extremely important.

Task

Given the labeled reviews of some product types, which is regarded as source domains and unlabeled reviews from another product type, which is regarded as target domain, we want to classify reviews from target domain into positive or negative class.

Data

We will use the benchmark dataset of Amazon review collected by Blitzer et al. (2007). This dataset gathered more than 340,000 reviews from 22 different product types, which can be regarded as different domains. Moreover, we do not label data manually, instead we use the star information as proposed in the original work.

Techniques

Firstly, the most naïve approach for this task is simply merging all examples in the multiple source product types, and leverage some single source domain adaptation algorithm like [1], [3] to classify target domain reviews.

As we assume that target domain unlabeled data is available, the second technique could follow a bootstrapping way of automatically adding target domain unlabeled data like proposed in [5].

Related Work

[1] John Blitzer, Mark Dredze, Fernando Pereira, Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification. Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 432-439, 2007.

[2] Hai Daume´ III. Frustratingly Easy Domain Adaptation. Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 256-263, June 2007.

[3] John Blitzer , Ryan McDonald , Fernando Pereira, Domain adaptation with structural correspondence learning, Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, July 22-23, 2006, Sydney, Australia.

[4] Jing Jiang, Chengxiang Zhai. Instance Weighting for Domain Adaptation in NLP. Proc. 45th Ann. Meeting of the Assoc. Computational Linguistics, pp. 264-271, June 2007.

[5] Dan Wu, Wee Sun Lee, Nan Ye, Hai Leong Chieu, Domain adaptive bootstrapping for named entity recognition. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3, August 06-07, 2009, Singapore

@@ Line 14: / Line 14: @@
 * @hilton_peggy it took me 2 hours to figure out [[Microsoft]] excel for my graph. I feel your pain.
+* @_Jasmaniandevil i like watching [[Obama]] make a fool of himself ...
+*  Not going to [[CMU]] homecoming :(
+You can see clearly from the examples that simply use a key-words-hit method in this task is not a very good way. Because these extended targets (Microsoft excel, Obama make a fool fo himself, Not going to CMU homecoming) is the real targets of the sentence, these extended targets themselves can make a difference in query name we input. Microsoft excel can be regarded as sentiment to Microsoft (with same polarity), Obama make a fool of himself can be regarded as sentiment to Obama (with opposite polarity) and CMU homecoming may have no sentiment transfered to CMU. Therefore, ways to use fine grained extended targets in this task is extremely important.
 == Task ==

Difference between revisions of "Towards fine grained extended targets in sentiment analysis"

Revision as of 00:05, 9 October 2012

Contents

Team members

Project Title

Introduction

Task

Data

Techniques

Related Work

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools