Difference between revisions of "Rosen-Zvi et al, The Author-Topic Model for Authors and Documents"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'Rosen-Zvi et al, The Author-Topic Model for Authors and Documents')
 
Line 1: Line 1:
 +
== Citation ==
 +
 
Rosen-Zvi et al, The Author-Topic Model for Authors and Documents
 
Rosen-Zvi et al, The Author-Topic Model for Authors and Documents
 +
 +
== Online version ==
 +
 +
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.5669&rep=rep1&type=pdf UAI'04]
 +
 +
== Summary ==
 +
 +
This [[Category::paper]] presents a probabilistic graphical model which can account for document generation taking into account the authors who have created the document collection. Potential applications include finding authors with similar recurring research interests, quantifying those research interests conditioned on the author, discovering topics present in a corpus. The basic ideas are:
 +
 +
* Use a corpus of articles which have the author meta data associated with them.
 +
 +
*  Build a [[UsesMethod:: Topic Model]] which could model the documents generation process by assigning each author with a separate topic mixture conditioned over the author. The mixture weight of each topic is dependent on the author.
 +
 +
* Use [[UsesMethod:: Gibbs sampling]]  to compute the desired posterior probabilities by sampling from the stable posterior distribution.
 +
 +
== Brief description of the method ==
 +
The paper describes a probabilistic process in which each word is assigned a topic which is sampled from an author conditional topic distribution. This allows the modeling of an author's interest in the form of a probability distribution over the topics present in the corpus. Each author can have different multinomial distributions over the topics present in the corpus. The topics in the corpus are represented as multinomial distribution over the vocabulary in the corpus. For each word present in a document an author is sampled uniformly from the co-authors of the document. Then a topic is sampled from the author specific topic distribution. Then a word is sampled from the language model which is indexed by the topic.
 +
 +
== Experimental Result ==
 +
The author used external tasks to verify the usability of the joint model. The first evaluation task was that of classification of web-pages [[UsesDataset:: Web KB dataset]] and abstracts from [[UsesDataset:: Cora network]]. The classification was done using a nearest neighbor method where the proximity was computed using [UsesMethod:: Cosine Similarity]. The joint model shows higher accuracy than either of the model in isolation however, no statistical significance testing was carried out. The second evaluation task was to predict a quantity called reference flow which could be used to predict link between a source and target document. In comparison to a placebo link detector the joint model performs significantly better.
 +
 +
== Related papers ==
 +
 +
An interesting related paper is [[RelatedPaper::Cohn, D. ICML 2000]] which proposes a latent variable model for citation.

Revision as of 22:54, 31 March 2011

Citation

Rosen-Zvi et al, The Author-Topic Model for Authors and Documents

Online version

UAI'04

Summary

This paper presents a probabilistic graphical model which can account for document generation taking into account the authors who have created the document collection. Potential applications include finding authors with similar recurring research interests, quantifying those research interests conditioned on the author, discovering topics present in a corpus. The basic ideas are:

  • Use a corpus of articles which have the author meta data associated with them.
  • Build a Topic Model which could model the documents generation process by assigning each author with a separate topic mixture conditioned over the author. The mixture weight of each topic is dependent on the author.
  • Use Gibbs sampling to compute the desired posterior probabilities by sampling from the stable posterior distribution.

Brief description of the method

The paper describes a probabilistic process in which each word is assigned a topic which is sampled from an author conditional topic distribution. This allows the modeling of an author's interest in the form of a probability distribution over the topics present in the corpus. Each author can have different multinomial distributions over the topics present in the corpus. The topics in the corpus are represented as multinomial distribution over the vocabulary in the corpus. For each word present in a document an author is sampled uniformly from the co-authors of the document. Then a topic is sampled from the author specific topic distribution. Then a word is sampled from the language model which is indexed by the topic.

Experimental Result

The author used external tasks to verify the usability of the joint model. The first evaluation task was that of classification of web-pages Web KB dataset and abstracts from Cora network. The classification was done using a nearest neighbor method where the proximity was computed using [UsesMethod:: Cosine Similarity]. The joint model shows higher accuracy than either of the model in isolation however, no statistical significance testing was carried out. The second evaluation task was to predict a quantity called reference flow which could be used to predict link between a source and target document. In comparison to a placebo link detector the joint model performs significantly better.

Related papers

An interesting related paper is Cohn, D. ICML 2000 which proposes a latent variable model for citation.