Difference between revisions of "Mark my words!"

From Cohen Courses
Jump to navigationJump to search
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
''Zhou
 
http://link.aps.org/doi/10.1103/PhysRevE.67.041908
 
 
== Citation ==
 
== Citation ==
  
PhysRevE.67.041908,
+
Cristian Danescu-Niculescu-Mizil, Michael Gamon, Susan T. Dumais: Mark my words!: linguistic style accommodation in social media. WWW 2011: 745-754
  title = {Network landscape from a Brownian particle's perspective},
 
  author = {Zhou, Haijun },
 
  journal = {Phys. Rev. E},
 
  volume = {67},
 
  number = {4},
 
  
== Abstract from the paper ==
 
 
Psychological theory of accommodation state that participants in conversation converge in variety of dimensions such as syntax, utterance length,gesture etc. In this paper, author investigate accommodation in twitter conversations. A probabilistic framework is proposed in order to compute stylistic cohesion,stylistic accommodation and stylistic influence and symmetry.     
 
  
 
== Online version ==
 
== Online version ==
  
[http://www.mpikg-golm.mpg.de/theory/people/zhou/works/PREv67p041908.pdf pdf link to the paper]
+
http://research.microsoft.com/en-us/um/people/sdumais/fr862-danescu-niculescu-mizil.pdf
 
 
=== Summary ===
 
"All things are made of atoms — little particles that move around in perpetual motion, attracting each other when they are a little distance apart, but repelling upon being squeezed into one another." - Feynman
 
 
 
 
 
 
 
In this [[Category::Paper|article]], Zhou proposes Brownian perspective of [[AddressesProblem::Community Detection|community formation in a network]].
 
 
 
* The main idea of this paper is to show how communities are formed due to diffusion like phenomenon in the network.
 
 
 
* The purpose of brownian perspective is to establish the notion of local attractors and global attractor in a network.
 
 
 
* A node that is closely associated with a local attractor would contribute to the stability of the community which in turn determines the tendency of a local attractor to be a global attractor.  [[UsesMethod::Netwalk|Custom method]]
 
  
* There is a strong resemblance between the structure of global community and local community, whenever the size of network is huge.
+
==Summary ==
 +
Physiological studies have suggested that participants in conversation accommodate in dimensions such as speaking style, utterance length, gesture, speaking rate etc. In this paper authors proves the hypothesis that linguistic  accommodation could be seen in social media such as Twitter. They investigate accommodation in LIWC dimensions[http://www.liwc.net/liwcdescription.php]. Some examples of these dimensions include use of article,negation words(not/no),preposition,quantifier,1st person singular pronoun,1st person plural pronoun,2nd person pronoun in conversation. Authors propose a novel probabilistic framework to prove their hypothesis.
  
  
The main results given are:
+
==Framework==
* [[UsesDataset::Football_networks|football fan networks]] 115 nodes and 613 unweighted edges, based on the connection pattern the method divides the network into 15 L communities.
 
* [[UsesDataset::Karate network|Zachary's karate network]] 34 nodes and 77 weighted edges and it was
 
* [[UsesDataset::Santa Fe Institute network|scientific collaboration network]] 118 nodes and 200 weighted edges. Divides this network into six communities. L_{3} has stronger direct interaction with community L_{6}.
 
* [[UsesDataset::Protein interaction network|yeast core of baker's yeast]], the largest of the datasets used in the experiments in this paper. It has been reported that there are 1471 proteins and 2770 unweighted edges. Divides this giant component into 14 G communities and 69 L communities. G-attractor has the major attention in the network, which makes the network vulnerable once that particular protein/attractor is removed, thus leaving the system perturbed.
 
  
== Background ==
+
When individual talk about some topic, they would have to use similar words to describe topics hence it is important to remove topic accommodation from overall accommodation measure. Since they use LIWC dimensions, it is automatically removed.  
This paper gives a nice impression of how well a physical process can be reformulated to fit a social science problem. To begin with the brownian perspective, we shall look into the brownian motion to understand the motivation behind this interesting formulation. The brownian motion, has been observed by Brown a botanist for the first time, while he was studying paramecium in the water. He noticed that some of the suspended particles in the solvent moves in a jiggly fashion. Hence, the observation of the motion is called Brownian motion. Later, in 1905 Albert Einstein at ETHZ, studied and explained the mechanics behind this jiggly motion of suspended particles in a fluid. The core findings mentioned in his paper are as following:
 
  
* When a solute is dissolved in the solvent that is contained in a semi-permeable (permeable to the solvent only) container then the pressure exerted on the walls of the container is due to solute molecules which is known as osmotic pressure. Earlier this was not the idea according to the concept of "free energy".  
+
Their framework is based on mainly two components, stylistic cohesion and stylistic accommodation.
  
* He went on to explain the cause of this osmotic pressure with the help of molecular kinetic theory of heat.
 
# Seperation between particles is large
 
# They have random velocities
 
# Unless they collide they are no attraction forces between them and their position at time is almost mutually independent. Yes, their collisions are elastic.
 
  
* He also explained the correlation between diffusion and the irregular movement of these particles. He found that the distribution of displacements after time t is interesting same as that of coincidental or fortuitous error. But, the coefficients in the exponential term are related to diffusion coefficient in his findings.
+
'''Stylstic Cohesion:'''
 +
It is used to find if tweets belonging to same conversation exhibit a certain LIWC style more than tweets which are unrelated. If the former is more then we can say that tweets which are part of same conversation agree more on a particular style. Formally, for a style <math> C </math> it is defined as:
  
* Based on his findings, he derived the formula to determine the mean displacement of an atom and insight towards a way to calculate the size of an atom.
+
 +
<math>Coh(C) =P(T^C \wedge  R^C | T \leftrightarrow  R)-P(T^C \wedge  R^C)</math>
  
== What's the brownian perspective in this context? ==
 
As we have understood from Einstein's paper that a suspended particle over a time displaces itself going through elastic collisions and by occasionally hitting the walls. Although, it is irregular it has been observed that it obeys the homogeneous liquid environment. In that context, if an intelligent suspended particle lives in a network and does brownian motion for a long time and measured positions after displacement is considered to be nodes of the network.
 
  
So, every edge weight is determined by the displacement of the suspended particle. This idea, is nothing but the well known random walk.
+
where <math> T \leftrightarrow  R </math> is condition which represent that tweets are from same conversation.  <math> P(T^C \wedge  R^C | T \leftrightarrow  R) </math> is the probability of tweets which are part of same conversation and exhibit style C. Whereas, <math>P(T^C \wedge  R^C) </math> is probability of observing style C in any randomly picked two tweets.
  
Thereby, the community structure is actually determined by the next node visited by the particle. Although it will be interesting to see if all suspended particles would think that this is the community because their motion is mutually independent over a period of time.
 
  
For the ease of quantification and lack of knowledge about the network, the displacement or jumping probability of the particle is assumed to be Pij = Aij / Sum-over-all-nearest neighbours Ail. Authors describe it as transfer matrix.
+
'''Stylistic accommodation:'''
  
== Observations ==
+
While measuring stylistic accommodation it is assumed that a twitter can accommodate in a style with his partner only if his partner exhibited style C in same conversation earlier. The formal definition of stylistic definition is as follows:
* On artificial networks – ensemble, competitive results vs Girvan and Newman Proc.
 
* But, not effective on hierarchical  communities  O(N^3) for N < =10^3
 
* Scale free property in real world - Decompose large networks into sub-networks
 
*  Say, Include nearest and next nearest neighbours in a subnet.
 
* On Sparse Networks - Computationally better when N < 10^3
 
* Linearly biased brownian motion is considerably superior to those of  unbiased and quadratic-biased. (refer to Gitterman  E.62 2000)
 
  
  
 +
<math>Acc(a;b)^C =P(T_b^C|T_a^C,T_b\rightarrow  T_a)-P(T_b^C|T_b\rightarrow  T_a)</math>
  
== Related Papers ==
 
* Zhou, H., Lipowsky, R.: Network Brownian Motion: A New Method to Measure Vertex-Vertex Proximity and toIdentify Communities and Subcommunities Phys. Rev. E 67 (2003) 041908
 
* Zhou, H.: Distance, dissimilarity index, and network community structure. Phys. Rev. E 67 (2003) 061901
 
  
 +
Here, <math>P(T_b^C|T_a^C,T_b\rightarrow  T_a)</math> represents the probability that style <math>C</math> was exhibited in tweets of user b after observing the same style in user <math>a</math>. Whereas, <math> P(T_b^C|T_b\rightarrow  T_a) </math> represent that style C was observed in user b irrespective of whether user a used the style C or not. Note the fact that <math>Acc(a;b)</math> is directional accommodation from a to b. They also defines accommodation from b to a. They use these two accommodation scores <math>Acc(a;b) </math> and <math>Acc(b;a) </math> to find if accommodation is symmetric or not.
  
== Study Plan ==
+
==Results: ==
Papers you may want to read to understand this paper.
 
  
* Girvan, M., Newman, M. E. J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA. 99 (2002), 7821-7826 [http://pds12.egloos.com/pds/200901/09/95/7821.full.pdf pdf]
+
Authors observe that <math>P(T^C \wedge  R^C | T \leftrightarrow  R)</math> is more than <math>P(T^C \wedge  R^C)</math> in considered LIWC styles. This confirms the fact that Stylistic Cohesion is present in Twitter. They also observe that <math>P(T_b^C|T_a^C,T_b\rightarrow  T_a)</math> is more than <math>P(T_b^C|T_b\rightarrow  T_a)</math>. This confirms that linguistic accommodation in LIWC styles is present in twitter.
** [http://en.wikipedia.org/wiki/Community_structure Community Structure]
 
** [http://en.wikipedia.org/wiki/Biological_network Biological Network]
 
  
* Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. e-print: cond-mat/0308217 [http://arxiv.org/pdf/cond-mat/0308217v1.pdf pdf]
+
==Related Paper: ==
  
* [http://en.wikipedia.org/wiki/Brownian_motion Brownian Motion]
+
* Rivka Levitan, Agustín Gravano, Julia Hirschberg: Entrainment in Speech Preceding Backchannels. ACL (Short Papers) 2011: 113-117
 +
* Rivka Levitan, Julia Hirschberg: Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions. INTERSPEECH 2011: 3081-3084

Latest revision as of 01:07, 2 October 2012

Citation

Cristian Danescu-Niculescu-Mizil, Michael Gamon, Susan T. Dumais: Mark my words!: linguistic style accommodation in social media. WWW 2011: 745-754


Online version

http://research.microsoft.com/en-us/um/people/sdumais/fr862-danescu-niculescu-mizil.pdf

Summary

Physiological studies have suggested that participants in conversation accommodate in dimensions such as speaking style, utterance length, gesture, speaking rate etc. In this paper authors proves the hypothesis that linguistic accommodation could be seen in social media such as Twitter. They investigate accommodation in LIWC dimensions[1]. Some examples of these dimensions include use of article,negation words(not/no),preposition,quantifier,1st person singular pronoun,1st person plural pronoun,2nd person pronoun in conversation. Authors propose a novel probabilistic framework to prove their hypothesis.


Framework

When individual talk about some topic, they would have to use similar words to describe topics hence it is important to remove topic accommodation from overall accommodation measure. Since they use LIWC dimensions, it is automatically removed.

Their framework is based on mainly two components, stylistic cohesion and stylistic accommodation.


Stylstic Cohesion: It is used to find if tweets belonging to same conversation exhibit a certain LIWC style more than tweets which are unrelated. If the former is more then we can say that tweets which are part of same conversation agree more on a particular style. Formally, for a style it is defined as:



where is condition which represent that tweets are from same conversation. is the probability of tweets which are part of same conversation and exhibit style C. Whereas, is probability of observing style C in any randomly picked two tweets.


Stylistic accommodation:

While measuring stylistic accommodation it is assumed that a twitter can accommodate in a style with his partner only if his partner exhibited style C in same conversation earlier. The formal definition of stylistic definition is as follows:



Here, represents the probability that style was exhibited in tweets of user b after observing the same style in user . Whereas, represent that style C was observed in user b irrespective of whether user a used the style C or not. Note the fact that is directional accommodation from a to b. They also defines accommodation from b to a. They use these two accommodation scores and to find if accommodation is symmetric or not.

Results:

Authors observe that is more than in considered LIWC styles. This confirms the fact that Stylistic Cohesion is present in Twitter. They also observe that is more than . This confirms that linguistic accommodation in LIWC styles is present in twitter.

Related Paper:

  • Rivka Levitan, Agustín Gravano, Julia Hirschberg: Entrainment in Speech Preceding Backchannels. ACL (Short Papers) 2011: 113-117
  • Rivka Levitan, Julia Hirschberg: Measuring Acoustic-Prosodic Entrainment with Respect to Multiple Levels and Dimensions. INTERSPEECH 2011: 3081-3084