Difference between revisions of "Hassan & Radev ACL 2010"

From Cohen Courses
Jump to navigationJump to search
 
(15 intermediate revisions by the same user not shown)
Line 14: Line 14:
 
This method can be used in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where only a handful of seeds is used to define the two polarity classes.
 
This method can be used in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where only a handful of seeds is used to define the two polarity classes.
  
== Background and preparation ==
+
== Background ==
  
 
*Network construction
 
*Network construction
Line 24: Line 24:
  
 
*First-passage time
 
*First-passage time
It is very similar to the definition of hitting time.
+
It is very similar to the definition of [[UsesMethod::hitting time]].
 +
The mean first-passage (hitting) time <math>h(i|k)</math> is defined as the average number of steps a random walker, starting in state <math>i \ne k</math>, will take to enter state <math>k</math> for the first time. Considering a subset vertices of the graph, then <math>h(i|S)</math> means the average number of steps a random walker, starting in state <math>i \notin S</math>, will take to enter a stae <math>k \in S</math> for the first time.
  
*
+
Then it is proven that:
 +
 
 +
<math>h(i|S) = 
 +
\begin{cases}
 +
0,  & i \in S \\
 +
\sum_{j \in V}p_{ij} * h(j|S) + 1,  & otherwise
 +
\end{cases}
 +
</math>
  
 
== Algorithm description ==
 
== Algorithm description ==
  
 
#Construct a word relatedness graph
 
#Construct a word relatedness graph
#Define a random walk on the graph
+
#Apply a random walk on the graph
 
#Compute the word's [[UsesMethod::hitting time]] for both the positive and negative sets of vertices
 
#Compute the word's [[UsesMethod::hitting time]] for both the positive and negative sets of vertices
 
#If the hitting time for the positive set is greater than for the negative set, than the word is classified as negative. Otherwise, it is classified as positive. The ratio between the two hitting times could be used as an indication of how positive/negative the given word is.
 
#If the hitting time for the positive set is greater than for the negative set, than the word is classified as negative. Otherwise, it is classified as positive. The ratio between the two hitting times could be used as an indication of how positive/negative the given word is.
Line 39: Line 47:
  
 
== Experiment result ==
 
== Experiment result ==
The test result is quite promising. It was verified on three known-community graphs and explored two unknown-community graphs. Both return high accuracy results.
+
Comparing to other methods, this method is quite successful in both the settings of semi-supervised and unsupervised.
  
* [[UsesDataset::Girvan et al PNAS 2002 computer-generated graph|computer-generated graph]]. If out of 16 edges of each vertex, six or less edges are inter-community edges, then the accuracy is pretty high: 100% accuracy of classifying the vertex to the community.
+
* In the setting of using [[UsesDataset::WordNet]] synonyms and hypernyms to construct the network and test set to the set of adjectives. It out performs the spin-model, bootstrap and short-path method.  
[[File:G_2002_result1.png]]
+
[[File:Hassan_%26_Radev_ACL_2010_E1.png]]
* [[UsesDataset::Karate network|Zachary's karate network]] Out of 34 nodes, only one node is classified incorrectly.
 
* [[UsesDataset::Football_networks|football networks]] Almost all teams are correctly grouped with the other teams in their conference. Few cases in which the algorithm seems to fail actually correspond to nuances in the scheduling of games.
 
  
* [[UsesDataset::Santa Fe Institute network|scientific collaboration network]] The algorithm seems to find two types of communities: scientists grouped together by similarity either of research topic or of methodology.
+
* It is also compared to the SO-PMI method in the setting of only 14 seeds. Though SO-PMI with a very large dataset performs slightly better than this method, this method is faster and does not need such large corpus.
* [[UsesDataset::Girvan et al PNAS 2002 food web|food web]] The algorithm finds out two well-defined communities of roughly equal size,plus a small number of vertices that belong to neither community. The split between the two large communities corresponds quite closely with the division between pelagic organisms and benthic organisms.
+
[[File:Hassan_%26_Radev_ACL_2010_E2.png]]
 
 
== Background ==
 
Some common properties of many networks
 
 
 
*[http://en.wikipedia.org/wiki/Small-world_network Small-word property] - average distance between vertices in a network is short
 
*[http://en.wikipedia.org/wiki/Power_law#Power-law_probability_distributions power-law degree distributions] - many vertices in a network with low degree and a small number with high degree
 
*network transitivity - two vertices having a same neighbor would have higher probability of being neighbors of each other.
 
 
 
 
 
A traditional method of constructing the communities
 
#Calculate the weight for each pair of vertices.
 
#Beginning from an vertex only set, add edges between pairs one by one in the desc order of the weights.
 
#The resulting graph shows a nested set of increasingly large components, which are taken to be the communities.
 
  
 
== Related papers ==
 
== Related papers ==
*[[RelatedPaper:: M. E. J. Newman PNAS 2001|M. E. J. Newman, The structure of scientic collaboration networks. Proc. Natl. Acad. Sci. USA 98, 404-409(2001)]]
+
*[[RelatedPaper:: Takamura et al. 2005|Hiroya Takamura, Takashi Inui, and Manabu Okumura.2005. Extracting semantic orientations of words using spin model. In ACL ’05.]]
*[[RelatedPaper::M. E. J. Newman Phys. Rev. E 2001 | M. E. J. Newman, Scientfic collaboration networks: II.Shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132 (2001)]]
+
*[[RelatedPaper::Turney and Littman, 2003 | Peter Turney and Michael Littman. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21:315–346.]]
*[[RelatedPaper::Zhou, Phy. Rev Phys. Rev. E 67, 041908 (2003)]]
+
*[[RelatedPaper::Kamps LREC 2004|Jaap Kamps, Maarten Marx, Robert J. Mokken, and Maarten De Rijke. 2004. Using wordnet to measure semantic orientations of adjectives. In National Institute for, pages 1115–1118.]]
 +
*[[RelatedPaper::Hu and Liu, 2004|Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In KDD ’04]]
 +
*[[RelatedPaper::Turney,2002|Peter D. Turney. 2002. Thumbs up or thumbs down?:semantic orientation applied to unsupervised classification of reviews. In ACL ’02]]
  
 
== Study plan ==
 
== Study plan ==
This paper is quite clear and self-explanatory thus require very little background in order to understand it. Some of the common properties in the [[Girvan et al PNAS 2002#Background | background section]] would be useful.
+
* Article: [http://en.wikipedia.org/wiki/Random_walk Random walk]
 
+
* Article: [http://en.wikipedia.org/wiki/Monte_Carlo_method Monto Carlo Method]
Just in case you are not familiar with graph:
 
*[http://en.wikipedia.org/wiki/Graph_theory graph theory]
 
 
 
The fast algorithm calculating betweenness:
 
*[[RelatedPaper::M. E. J. Newman Phys. Rev. E 2001 | M. E. J. Newman, Scientfic collaboration networks: II.Shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132 (2001)]]
 

Latest revision as of 05:29, 2 October 2012

Citation

Ahmed Hassan and Dragomir R. Radev. 2010. Identifying text polarity using random walks. In ACL 2010.

Online Version

online version

Summary

This paper shows a method for identifying the polarity of words which addresses the topic of Polarity Classification of words.

The method is based on the observation that a random walk starting at a given word is more likely to hit another word with the same semantic orientation before hitting a word with a different semantic orientation. It applies random walk model to a large word relatedness graph and produce a polarity estimate for any given word.

This method can be used in a semi-supervised setting where a training set of labeled words is used, and in an unsupervised setting where only a handful of seeds is used to define the two polarity classes.

Background

  • Network construction

The dataset WordNet is used to construct the network of words.Collect all words in WordNet, and add links between any two words that occurr in the same synset. The resulting graph is a graph where is a set of word / part-of-speech pairs for all the words in WordNet. is the set of edges connecting each pair of synonymous words.

  • Random walk model

Starting from a word with unknown polarity , it moves to a node with probability after the first step. The walk continues until the surfer hits a word with a known polarity.

  • First-passage time

It is very similar to the definition of hitting time. The mean first-passage (hitting) time is defined as the average number of steps a random walker, starting in state , will take to enter state for the first time. Considering a subset vertices of the graph, then means the average number of steps a random walker, starting in state , will take to enter a stae for the first time.

Then it is proven that:

Algorithm description

  1. Construct a word relatedness graph
  2. Apply a random walk on the graph
  3. Compute the word's hitting time for both the positive and negative sets of vertices
  4. If the hitting time for the positive set is greater than for the negative set, than the word is classified as negative. Otherwise, it is classified as positive. The ratio between the two hitting times could be used as an indication of how positive/negative the given word is.

Since computing the hitting time is time consuming especially when the graph is large, a Monte Carlo based estimating algorithm is proposed as such: Word polarity using random walks.png

Experiment result

Comparing to other methods, this method is quite successful in both the settings of semi-supervised and unsupervised.

  • In the setting of using WordNet synonyms and hypernyms to construct the network and test set to the set of adjectives. It out performs the spin-model, bootstrap and short-path method.

Hassan & Radev ACL 2010 E1.png

  • It is also compared to the SO-PMI method in the setting of only 14 seeds. Though SO-PMI with a very large dataset performs slightly better than this method, this method is faster and does not need such large corpus.

Hassan & Radev ACL 2010 E2.png

Related papers

Study plan