Difference between revisions of "Mark my words!"

From Cohen Courses
Jump to navigationJump to search
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
This a [[Category::Paper]] discussed in Social Media Analysis 10-802 in Spring 2010.
+
== Citation ==
  
== Citation ==
+
Cristian Danescu-Niculescu-Mizil, Michael Gamon, Susan T. Dumais: Mark my words!: linguistic style accommodation in social media. WWW 2011: 745-754
  
Neighborhood Formation and Anomaly Detection in Bipartite Graphs,
 
Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, Christos Faloutsos, ICDM 2005
 
  
 
== Online version ==
 
== Online version ==
  
[http://www.cs.cmu.edu/~deepay/mywww/papers/icdm05.pdf Neighborhood Formation and Anomaly Detection in Bipartite Graphs]
+
http://research.microsoft.com/en-us/um/people/sdumais/fr862-danescu-niculescu-mizil.pdf
 +
 
 +
==Summary ==
 +
Physiological studies have suggested that participants in conversation accommodate in dimensions such as speaking style, utterance length, gesture, speaking rate etc. In this paper authors proves the hypothesis that linguistic  accommodation could be seen in social media such as Twitter. They investigate accommodation in LIWC dimensions[http://www.liwc.net/liwcdescription.php]. Some examples of these dimensions include use of article,negation words(not/no),preposition,quantifier,1st person singular pronoun,1st person plural pronoun,2nd person pronoun in conversation. Authors propose a novel probabilistic framework to prove their hypothesis.
 +
 
 +
 
 +
==Framework==
 +
 
 +
When individual talk about some topic, they would have to use similar words to describe topics hence it is important to remove topic accommodation from overall accommodation measure. Since they use LIWC dimensions, it is automatically removed.
 +
 
 +
Their framework is based on mainly two components, stylistic cohesion and stylistic accommodation.
 +
 
 +
 
 +
'''Stylstic Cohesion:'''
 +
It is used to find if tweets belonging to same conversation exhibit a certain LIWC style more than tweets which are unrelated. If the former is more then we can say that tweets which are part of same conversation agree more on a particular style. Formally, for a style <math> C </math> it is defined as:
 +
 
 +
 +
<math>Coh(C) =P(T^C \wedge  R^C | T \leftrightarrow  R)-P(T^C \wedge  R^C)</math>
 +
 
 +
 
 +
where <math> T \leftrightarrow  R </math> is condition which represent that tweets are from same conversation. <math> P(T^C \wedge  R^C | T \leftrightarrow  R) </math> is the probability of tweets which are part of same conversation and exhibit style C. Whereas, <math>P(T^C \wedge  R^C) </math> is probability of observing style C in any randomly picked two tweets.
 +
 
 +
 
 +
'''Stylistic accommodation:'''
  
== Summary ==
+
While measuring stylistic accommodation it is assumed that a twitter can accommodate in a style with his partner only if his partner exhibited style C in same conversation earlier. The formal definition of stylistic definition is as follows:
Physiological studies have suggested that participants in conversation accommodate in dimensions such as style, utterance length, gesture, speaking rate etc. In this paper authors investigate accommodation in twitter. They propose a novel probabilistic framework to compute measures such as stylistic cohesion,stylistic accommodation and stylistic influence and symmetry.
 
  
In this paper, author investigate accommodation in style. They use non-topical LIWC[link] dimensions. Some of examples for dimensions include use of Article,Negation words(not/no),Preposition,Quanti�er,1st person singular pronoun,1st person plural pronoun,2nd person pronoun in conversation.
 
  
== Evaluation ==
+
<math>Acc(a;b)^C =P(T_b^C|T_a^C,T_b\rightarrow  T_a)-P(T_b^C|T_b\rightarrow  T_a)</math>
  
They evaluate their methods by asking following 4 questions :
 
  - Does NF find out meaningful neighborhoods?
 
  - How close ispproximate NF to exact NF?
 
  - Can AD detect injected anomalies?
 
  - How much time these methods take to run on graphs of varying sizes?
 
  
== Discussion ==
+
Here, <math>P(T_b^C|T_a^C,T_b\rightarrow  T_a)</math> represents the probability that style <math>C</math> was exhibited in tweets of user b after observing the same style in user <math>a</math>. Whereas, <math> P(T_b^C|T_b\rightarrow  T_a) </math> represent that style C was observed in user b irrespective of whether user a used the style C or not. Note the fact that <math>Acc(a;b)</math> is directional accommodation from a to b. They also defines accommodation from b to a. They use these two accommodation scores <math>Acc(a;b) </math> and <math>Acc(b;a) </math> to find if accommodation is symmetric or not.
This paper poses two important social problems related to bipartite social graphs and explained how those problems can be solved efficiently using random walks.
 
  
They also claim that the neighborhoods over nodes can represent personalized clusters depending on different perspectives.
+
==Results: ==
  
During presentation one of the audiences raised question about is anomaly detection in this paper similar to betweenness of edges defined in Kleinber's text as discussed in [[Class Meeting for 10-802 01/26/2010]]. I think they are similar. In the texbook they propose, detecting edges with high betweenness and using them to partition the graph. In this paper they first try to create neighbourhood partitions based on random walk prbabilities and which as a by product gives us nodes and edges with high betweenness value.
+
Authors observe that <math>P(T^C \wedge  R^C | T \leftrightarrow  R)</math> is more than <math>P(T^C \wedge  R^C)</math> in considered LIWC styles. This confirms the fact that Stylistic Cohesion is present in Twitter. They also observe that <math>P(T_b^C|T_a^C,T_b\rightarrow  T_a)</math> is more than <math>P(T_b^C|T_b\rightarrow  T_a)</math>. This confirms that linguistic accommodation in LIWC styles is present in twitter.
  
== Related papers ==
+
==Related Paper: ==
There has been a lot of work on anomaly detection in graphs.
 
* The paper by [[RelatedPaper::Moonesinghe and Tan ICTAI06]] finds the clusters of outlier objects by doing random walk on the weighted graph.
 
* The paper by [[RelatedPaper::Aggarwal SIGMOD 2001]] proposes techniques for projecting high dimensional data on lower dimensions to detect outliers.
 
  
== Study plan ==
+
* Rivka Levitan, Agustín Gravano, Julia Hirschberg: Entrainment in Speech Preceding Backchannels. ACL (Short Papers) 2011: 113-117
* Article:Bipartite graph:[http://en.wikipedia.org/wiki/Bipartite_graph]
+
* Rivka Levitan, Julia Hirschberg: Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions. INTERSPEECH 2011: 3081-3084
* Article:Anomaly detection:[http://en.wikipedia.org/wiki/Anomaly_detection]
 
* Paper:Topic sensitive pagerank:[http://dl.acm.org/citation.cfm?id=511513]
 
**Paper:The PageRank Citation Ranking: Bringing Order to the Web:[http://ilpubs.stanford.edu:8090/422/]
 
* Paper:Multilevel k-way Partitioning Scheme for Irregular Graphs:[http://glaros.dtc.umn.edu/gkhome/node/81]
 

Latest revision as of 02:07, 2 October 2012

Citation

Cristian Danescu-Niculescu-Mizil, Michael Gamon, Susan T. Dumais: Mark my words!: linguistic style accommodation in social media. WWW 2011: 745-754


Online version

http://research.microsoft.com/en-us/um/people/sdumais/fr862-danescu-niculescu-mizil.pdf

Summary

Physiological studies have suggested that participants in conversation accommodate in dimensions such as speaking style, utterance length, gesture, speaking rate etc. In this paper authors proves the hypothesis that linguistic accommodation could be seen in social media such as Twitter. They investigate accommodation in LIWC dimensions[1]. Some examples of these dimensions include use of article,negation words(not/no),preposition,quantifier,1st person singular pronoun,1st person plural pronoun,2nd person pronoun in conversation. Authors propose a novel probabilistic framework to prove their hypothesis.


Framework

When individual talk about some topic, they would have to use similar words to describe topics hence it is important to remove topic accommodation from overall accommodation measure. Since they use LIWC dimensions, it is automatically removed.

Their framework is based on mainly two components, stylistic cohesion and stylistic accommodation.


Stylstic Cohesion: It is used to find if tweets belonging to same conversation exhibit a certain LIWC style more than tweets which are unrelated. If the former is more then we can say that tweets which are part of same conversation agree more on a particular style. Formally, for a style it is defined as:



where is condition which represent that tweets are from same conversation. is the probability of tweets which are part of same conversation and exhibit style C. Whereas, is probability of observing style C in any randomly picked two tweets.


Stylistic accommodation:

While measuring stylistic accommodation it is assumed that a twitter can accommodate in a style with his partner only if his partner exhibited style C in same conversation earlier. The formal definition of stylistic definition is as follows:



Here, represents the probability that style was exhibited in tweets of user b after observing the same style in user . Whereas, represent that style C was observed in user b irrespective of whether user a used the style C or not. Note the fact that is directional accommodation from a to b. They also defines accommodation from b to a. They use these two accommodation scores and to find if accommodation is symmetric or not.

Results:

Authors observe that is more than in considered LIWC styles. This confirms the fact that Stylistic Cohesion is present in Twitter. They also observe that is more than . This confirms that linguistic accommodation in LIWC styles is present in twitter.

Related Paper:

  • Rivka Levitan, Agustín Gravano, Julia Hirschberg: Entrainment in Speech Preceding Backchannels. ACL (Short Papers) 2011: 113-117
  • Rivka Levitan, Julia Hirschberg: Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions. INTERSPEECH 2011: 3081-3084