Difference between revisions of "Miller et al ICWSM 2011"

From Cohen Courses
Jump to navigationJump to search
Line 23: Line 23:
  
 
== Methodology ==
 
== Methodology ==
- Sentiment Extraction
+
- '''Sentiment Extraction'''
  
 
The documents has been treated as a bag-of-word model. [http://www.wjh.harvard.edu/~inquirer/ Harvard Inquirer] and [http://sentiwordnet.isti.cnr.it/ SentiWordNet] has been used to obtain the sentiment scores of the individual words in the post. The sentiment attributes are - positivity, negativity and objectivity of a post.  
 
The documents has been treated as a bag-of-word model. [http://www.wjh.harvard.edu/~inquirer/ Harvard Inquirer] and [http://sentiwordnet.isti.cnr.it/ SentiWordNet] has been used to obtain the sentiment scores of the individual words in the post. The sentiment attributes are - positivity, negativity and objectivity of a post.  
Line 30: Line 30:
 
The authors define the '''average sentiment of a user''' as the baseline and then computes the '''deviation of the individual posts''' as the polarity of the post. Each domain has been considered as an author and the baseline for the domain has been obtained by averaging over the sentiment of the individual posts.
 
The authors define the '''average sentiment of a user''' as the baseline and then computes the '''deviation of the individual posts''' as the polarity of the post. Each domain has been considered as an author and the baseline for the domain has been obtained by averaging over the sentiment of the individual posts.
  
- Identification of Cascades
+
- '''Identification of Cascades and its Topology'''
 +
 +
The data has been modeled as a graph. Each node represents a blog post, which has its sentiment score as the attribute.
 +
A directed edge from ''u'' to ''v'' represents that the post ''u'' contains a hyperlink citing ''v''. The nodes with no outdegrees represents posts which start the flow of the sentiments and are referred as '''cascade initiators'''.
 +
The topology of a cascade is obtained by applying Breadth-first Search (BFS) from the cascade intiators.
 +
 +
 
 +
== Findings/Analysis ==
 +
 
 +
- '''Post Level Analysis'''
 +
 
 +
  Given an edge from u to v, u is referred to as the parent of v, and v is referred to as the child of u.
 +
  The analysis shows that the subjectivity of a child is attributed to the subjectivity of its parent. The usage of subjective language in the parent post leads to higher sentiment score in the child post.
 +
 
 +
- '''Cascade Level Analysis'''
 +
 
 +
  Sentiment in a cascade exhibits 4 phases.
 +
 
 +
* At the cascade initiator, language is close to the baseline.
 +
* Positivity and negativity heat up quickly.
 +
* The sentiments cools off fairly quickly.
 +
* Returns to the mild baseline.
 +
 
 +
  The trends in the sentiment usage for shallow and deep cascades have been compared.
 +
* Shallow cascades are shown to start off with a slight sentiment support and then dies out quickly.
 +
* Deep cascades shows more extremity in the expressiveness of the subjective language.
 +
 
 +
A similar trend is also obtained for the emoticon-based approach.
 +
 
 +
 
 +
== Conclusion ==
  
 
== Study Plan ==
 
== Study Plan ==
 
- [http://www.wjh.harvard.edu/~inquirer/ Harvard Inquirer]
 
- [http://www.wjh.harvard.edu/~inquirer/ Harvard Inquirer]
 
- C++ SNAP library
 
- C++ SNAP library

Revision as of 07:12, 27 September 2012

Citation

author    = {Mahalia Miller and
              Conal Sathi and
              Daniel Wiesenthal and
              Jure Leskovec and
              Christopher Potts},
 title     = {Sentiment Flow Through Hyperlink Networks},
 booktitle = {ICWSM},
 year      = {2011},
 ee        = {http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2883},
 crossref  = {DBLP:conf/icwsm/2011},
 bibsource = {DBLP, http://dblp.uni-trier.de}

Online Version

http://cs.stanford.edu/people/jure/pubs/sentiflow-icwsm11.pdf

Main Idea

This paper combines the work done in sentiment analysis of text and graph analysis in order to study the flow of sentiments through a network of blog posts connected by hyperlinks.

Dataset

The data has been obtained from the MemeTracker Project for the month of August 2010. The dataset consists of roughly 1 million blog posts per day. Each post consists of a URL, time stamp, full text of the post and the list of URLs to the posts it cites. The data has pruned to remove singleton posts ( posts which do not link to any other posts). The links to self posts and to the posts outside the data has been removed in order to focus on the flow of sentiments within the network. The dataset used has aprroximately 8 million blog posts and 15 million hyperlinked edges.

Methodology

- Sentiment Extraction

The documents has been treated as a bag-of-word model. Harvard Inquirer and SentiWordNet has been used to obtain the sentiment scores of the individual words in the post. The sentiment attributes are - positivity, negativity and objectivity of a post. The result of the analysis. The paper proposes sentiment extraction from emoticon. The authors define the average sentiment of a user as the baseline and then computes the deviation of the individual posts as the polarity of the post. Each domain has been considered as an author and the baseline for the domain has been obtained by averaging over the sentiment of the individual posts.

- Identification of Cascades and its Topology

The data has been modeled as a graph. Each node represents a blog post, which has its sentiment score as the attribute. A directed edge from u to v represents that the post u contains a hyperlink citing v. The nodes with no outdegrees represents posts which start the flow of the sentiments and are referred as cascade initiators. The topology of a cascade is obtained by applying Breadth-first Search (BFS) from the cascade intiators.


Findings/Analysis

- Post Level Analysis

 Given an edge from u to v, u is referred to as the parent of v, and v is referred to as the child of u. 
 The analysis shows that the subjectivity of a child is attributed to the subjectivity of its parent. The usage of subjective language in the parent post leads to higher sentiment score in the child post.

- Cascade Level Analysis

  Sentiment in a cascade exhibits 4 phases.
  • At the cascade initiator, language is close to the baseline.
  • Positivity and negativity heat up quickly.
  • The sentiments cools off fairly quickly.
  • Returns to the mild baseline.
 The trends in the sentiment usage for shallow and deep cascades have been compared.
  • Shallow cascades are shown to start off with a slight sentiment support and then dies out quickly.
  • Deep cascades shows more extremity in the expressiveness of the subjective language.

A similar trend is also obtained for the emoticon-based approach.


Conclusion

Study Plan

- Harvard Inquirer - C++ SNAP library