Difference between revisions of "The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email"

From Cohen Courses
Jump to navigationJump to search
(Created page with '== Citation == McCallum, A., Corrada-Emmanuel, A., and Wang, X. The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Aca…')
 
 
(One intermediate revision by the same user not shown)
Line 20: Line 20:
 
We can marginalize the author or recipient in order to see the topics a person would be likely to send or receive.
 
We can marginalize the author or recipient in order to see the topics a person would be likely to send or receive.
 
This person-conditioned topic distribution can be used to calculate similarity between people.
 
This person-conditioned topic distribution can be used to calculate similarity between people.
 +
 +
A comparison of the Author-Topic model and the Author-Recipient-Topic model is shown below. Note that in the AR model, there is a separate topic-distribution, <math>\theta</math>, for each author <math>a</math>, whereas in the ART model, there is a <math>\theta</math> for each ''pair'' of author and recipient <math>r</math>.
 +
 +
[[Image:art_model_comparison.png|400px]]
 +
  
 
== Results ==
 
== Results ==
  
The authors conduct a qualitative analysis of the Author-Recipient-Topic model on the [[UsesDataset::Enron email corpus]] and the [[UsesDataset::McCallum email corpus]], comparing it against Author-Topic model and a stochastic block model.
+
The authors conduct a qualitative analysis of the Author-Recipient-Topic model on the [[UsesDataset::Enron email corpus]] and the [[UsesDataset::McCallum email corpus]],
 +
comparing it against Author-Topic model and a stochastic block model (SNA). In general, the authors posit that the ART model is more appropriate than the AT model and SNA model.
 +
 
 +
For example, the table below shows the most similar pairs calculated for the [[UsesDataset::McCallum email corpus]]. In general, the predictions of the ART model look reasonable while the pairs predicted by the SNA model does not look so well.
  
 
[[Image:art_model.png|400px]]
 
[[Image:art_model.png|400px]]
  
 +
== Discussion ==
  
== Discussion ==
+
This model is limited in the sense that we need to recalculate the model for the entire network whenever we see new nodes/edges.
  
 
== Related papers ==
 
== Related papers ==
 +
 +
* [[RelatedPaper::Rosen-Zvi et al, The Author-Topic Model for Authors and Documents]] proposes the Author-Topic model, which this paper expands upon.
  
 
== Study plan ==
 
== Study plan ==
 +
 +
Much of this paper is self-explanatory, assuming that the reader is familiar with topic models in general.

Latest revision as of 23:01, 5 November 2012

Citation

McCallum, A., Corrada-Emmanuel, A., and Wang, X. The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email, 2004. Technical Report UM-CS-2004-096.

Online version

The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email

Summary

Consider the problem of modeling a company's email network. Let's say Michael is a boss, Pam is his assistant, and the two both mail similar people. If we only consider the network structure of this email network, both Michael and Pam would be assigned with similar roles. Their roles as a boss and an assistant only becomes clear when we consider the language content of the emails that the two send out.

This paper builds on this idea, combining language content/topic in traditional social network analysis (where only network structure was considered). The authors extend upon the Author-Topic model, in which a topic distribution (distribution over words) exists for each author. Instead, in the Author-Recipient-Topic model (which is presented in this paper), their is a topic distribution for each author-recipient pair.

We can marginalize the author or recipient in order to see the topics a person would be likely to send or receive. This person-conditioned topic distribution can be used to calculate similarity between people.

A comparison of the Author-Topic model and the Author-Recipient-Topic model is shown below. Note that in the AR model, there is a separate topic-distribution, , for each author , whereas in the ART model, there is a for each pair of author and recipient .

Art model comparison.png


Results

The authors conduct a qualitative analysis of the Author-Recipient-Topic model on the Enron email corpus and the McCallum email corpus, comparing it against Author-Topic model and a stochastic block model (SNA). In general, the authors posit that the ART model is more appropriate than the AT model and SNA model.

For example, the table below shows the most similar pairs calculated for the McCallum email corpus. In general, the predictions of the ART model look reasonable while the pairs predicted by the SNA model does not look so well.

Art model.png

Discussion

This model is limited in the sense that we need to recalculate the model for the entire network whenever we see new nodes/edges.

Related papers

Study plan

Much of this paper is self-explanatory, assuming that the reader is familiar with topic models in general.