Difference between revisions of "Modeling Relational Events via Latent Classes"

From Cohen Courses
Jump to navigationJump to search
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
This a [[Category::Paper]] discussed in Social Media Analysis 10-802 in Spring 2011.
 +
 
== Citation ==
 
== Citation ==
  
Line 9: Line 11:
 
== Summary ==
 
== Summary ==
  
Many social network activities can be described as a series of dyadic events. An event in this paper is defined as a triple of (sender, receiver, event_type). Authors assume that such events are generated by some latent class and in the paper they proposed a graphical model to identify the latent class as well as dyadic events with the inference implementation of Gibbs sampling and Expectation-Maximization methods.
+
Many social network activities can be described as a series of dyadic events. An event in this paper is defined as a triple of (sender, receiver, event_type). Authors assume that such events are generated by some latent class and in the paper they proposed a graphical model to [[AddressesProblem:: identify the latent class as well as dyadic events]] with the inference implementation of [[UsesMethod::Graphical Models with Gibbs sampling and Expectation-Maximization methods]].  
  
 
== Methodology ==
 
== Methodology ==
Line 32: Line 34:
 
(c) Draw <math>r|c</math> ~ Multinomial(<math>\bar{\phi_c}</math>), the event’s receiver
 
(c) Draw <math>r|c</math> ~ Multinomial(<math>\bar{\phi_c}</math>), the event’s receiver
  
(d) Draw <math>a|c</math> ~ Multinomial(<math>\bar{\psi_c}</math>), the event’s typ
+
(d) Draw <math>a|c</math> ~ Multinomial(<math>\bar{\psi_c}</math>), the event’s type
  
LDA does not explicitly model the temporal relationship. There are two other common ways to capture the temporal information: the Dynamic Topic Model (Blei and Lafferty, 2006), representing each years' documents as generated from a normal distribution centroid over topics, with the following year's centroid generated from the preceding year's; the Topics over Time Model (Wang and McCallum, 2006), assuming that each document chooses its own time stamp based on a topic-specific beta distribution. But both of these models impose constraints on the time periods: the Dynamic Topic Model penalizes large changes from year to year while the beta distribution in Topics over Time Model are relatively inflexible.
+
[[File:Kdd2010_gm.png]]
  
So in this paper the authors first apply the LDA (implemented in [http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation Gibbs Sampling]) at each year. Then they perform [[UsesMethod::post hoc]] calculations based on the observed probability of each topic given the current year. Define <math>\hat{p}(z|y)</math> as the empirical probability that an arbitrary paper <math>d</math> written in year <math>y</math> was about topic <math>z</math>:
+
It's not hard to work out the likelihood for the data:
  
<math>\hat{p}(z|y) = \frac{1}{C}\sum_{d:t_d = y} \sum_{z'_i \in d} I(z'_i = z)</math>
+
[[File:Kdd2010_joint.png]]
  
where <math>I</math> is the indicator function, <math>t_d</math> is the data document <math>d</math> was written, <math>\hat{p}(z|y)</math> is set to a constant <math>1/C</math>.
+
Two ways of inference, Gibbs sampling and EM, are implemented in this paper.
  
 +
=== Predicting ===
 +
We can make predictions from the parameters inferred:
  
Define <math>\hat{p}(z|c)</math> as the empirical distribution of a topic <math>z</math> at a conference <math>c</math>:
+
[[File:Kdd2010_predict.png]]
 
 
<math>\hat{p}(z|c) = \frac{1}{C}\sum_{d:c_d = c} \sum_{z'_i \in d} I(z'_i = z)</math>.
 
 
 
 
 
Define [[UsesMethod::topic entropy]] to measure the breadth of a conference:
 
 
 
<math>H(z|c,y) = - \sum_{i=1}^{K} \hat{p}(z_i|c,y)log \hat{p}(z_i|c,y)</math>.
 
  
 +
== Data ==
 +
A data set of [[UsesDataset:: international events involving entities from 450 countries over the 2000-2005 time period]]. This data has been used by political scientists to explore international relations and policy. The authors used an automated system for coding 3,575,897 events from Reuters news reports. Each of these events takes the form: [entity A] [action] [entity B]. Actions in this data set consist of 247 possible types, such as judicial action, military action, and so forth.
  
Finally use [[UsesMethod::Jensen-Shannon divergence]](JS divergence) to investigate whether or not the topic distributions of the conferences are converging:
+
A quick example of what the output might be like:
  
<math>D_{JS}(P||Q) = \frac{1}{2}D_{KL}(P||R) + \frac{1}{2}D_{KL}(Q||R)</math>
+
[[File:Kdd2010_ex.png]]
 
 
<math>R = \frac{1}{2}(P+Q)</math>
 
 
 
 
 
== Data ==
 
[[UsesDataset::ACL Antology]]
 
  
 
== Experimental Result ==
 
== Experimental Result ==
* Historical Trends in Computational Linguistics
+
* Uniform baseline: simply predict all events are equally likely
 
+
* Multinomial baseline: predict by observed frequency for each event
To visualize some trend, they show the probability mass asscociated with various topics over time, plotted as (a smoothed version of) <math>\hat{p}(z|y)</math>. The topics becoming more prominent are such as ''classification, probabilistic models, stat. parsing, stat. MT and lex. sem'', while the topics declined are ''computational semantics, conceptual semantics and plan-based dialogue and discourse''.
+
* And predict from the graphical model
 
 
 
 
* Is Computational Linguistics Becoming More Applied?
 
 
 
Look at trends over time for some applications such as ''Machine Translation, Spelling Correction, Dialogue Systems'' etc and found there is a clear trend toward an increase in applications over time.
 
 
 
 
 
* Differences and Similarities Among COLING, ACL and EMNLP
 
 
 
Inferred from the topic entropy, COLING has been historically the broadest of the three conferences; ACL started with a fairly narrow focus, became nearly as broad as COLING during the 1990's but become more narrow again in recent years; EMNLP shows being its status as a "special interest" conference.
 
 
 
From the JS divergence, they showed all of the three conferences are converging to their topics.
 
 
 
== Related papers ==
 
  
[[RelatedPaper::Blei and Lafferty, ICML2006]]: David Blei and John D. Lafferty. 2006. Dynamic topic models. ICML.
+
MPMM is the model authors designed. MCGS is an algorithm used in Collapsed Gibbs Sampling process (detail omitted here). C is the number of classes. Well in a short word the graphical model one works much better than baselines.
  
[[RelatedPaper::Wang and McCallum, KDD2006]]: Xuerui Wang and Andrew McCallum. 2006. Topics over time: a non-Markov continuous-time model of topical trends. In KDD, pages 424–433, New York, NY, USA. ACM.
+
[[File:Kdd2010_result.png]]

Latest revision as of 19:32, 7 February 2011

This a Paper discussed in Social Media Analysis 10-802 in Spring 2011.

Citation

Christopher DuBois, Padhraic Smyth. Modeling Relational Events via Latent Classes. KDD 2010

Online version

download here

Summary

Many social network activities can be described as a series of dyadic events. An event in this paper is defined as a triple of (sender, receiver, event_type). Authors assume that such events are generated by some latent class and in the paper they proposed a graphical model to identify the latent class as well as dyadic events with the inference implementation of Graphical Models with Gibbs sampling and Expectation-Maximization methods.

Methodology

It's assumed that relational events are generated by following process:

  • Draw the class distribution ~ Dirichlet()
  • Draw distributions:

~ Dirichlet()

~ Dirichlet()

~ Dirichlet()

for all c in {1...C}

  • For each event

(a) Draw ~ Multinomial(), the event’s class

(b) Draw ~ Multinomial(), the event’s sender

(c) Draw ~ Multinomial(), the event’s receiver

(d) Draw ~ Multinomial(), the event’s type

Kdd2010 gm.png

It's not hard to work out the likelihood for the data:

Kdd2010 joint.png

Two ways of inference, Gibbs sampling and EM, are implemented in this paper.

Predicting

We can make predictions from the parameters inferred:

Kdd2010 predict.png

Data

A data set of international events involving entities from 450 countries over the 2000-2005 time period. This data has been used by political scientists to explore international relations and policy. The authors used an automated system for coding 3,575,897 events from Reuters news reports. Each of these events takes the form: [entity A] [action] [entity B]. Actions in this data set consist of 247 possible types, such as judicial action, military action, and so forth.

A quick example of what the output might be like:

Kdd2010 ex.png

Experimental Result

  • Uniform baseline: simply predict all events are equally likely
  • Multinomial baseline: predict by observed frequency for each event
  • And predict from the graphical model

MPMM is the model authors designed. MCGS is an algorithm used in Collapsed Gibbs Sampling process (detail omitted here). C is the number of classes. Well in a short word the graphical model one works much better than baselines.

Kdd2010 result.png