Difference between revisions of "Cohn et al, Advances in Neural Information Processing Systems 2001"

Revision as of 20:54, 28 March 2011

Citation

Cohn et al. The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity. Advances in Neural Information Processing Systems.

Online version

NIPS

Summary

This paper presents an interesting approach of jointly modeling the document and link generation. Potential applications include Community Detection . The basic ideas are:

Use a corpus of articles which have links between them. Examples of such articles are webpages with hyperlinks, scientific articles with citations etc.

Build a Topic Model which could jointly model the documents along with the citations between the documents. Both the words and citations in a document are dependent on the topic proportion present in the document.

Use Expectation Maximization to compute the desired posterior probabilities.

Brief description of the method

The paper describes a method in which the document generation and link generation can be combined by using already known probabilistic version of LSA and HITS algorithm. More specifically both the terms in a document and the links present in the document are generated over a document-specific mixing proportion of factors. For all practical purposes these factors can be considered as topics which are multinomials over the entire vocabulary as in Latent Dirichlet Allocation. The standard method used is to evaluate an expression for the joint likelihood of the corpus and then use Expectation Maximization to compute the topic conditional distribution and the mixing proportions of the document.

Experimental Result

The author used external tasks to verify the usability of the joint model. The first evaluation task was that of classification of web-pages Web KB dataset and abstracts from Cora. The classification was done using a nearest neighbor method where the proximity was computed using [UsesMethod:: Cosine Similarity]. The joint model shows higher accuracy than either of the model in isolation however, no statistical significance testing was carried out. The second evaluation task was to predict a quantity called reference flow which could be used to predict link between a source and target document. In comparison to a placebo link detector the joint model performs significantly better.

Related papers

An interesting related paper is Cohn, D. ICML 2000 which proposes a latent variable model for citation.

@@ Line 22: / Line 22: @@
 == Experimental Result ==
 The author used external tasks to verify the usability of the joint model. The first evaluation task was that of classification of web-pages [[UsesDataset:: Web KB dataset]] and abstracts from [[UsesDataset:: Cora]]. The classification was done using a nearest neighbor method where the proximity was computed using [UsesMethod:: Cosine Similarity]. The joint model shows higher accuracy than either of the model in isolation however, no statistical significance testing was carried out. The second evaluation task was to predict a quantity called reference flow which could be used to predict link between a source and target document. In comparison to a placebo link detector the joint model performs significantly better.
-This experiment was carried out on the [[UsesDataset:: Technorati Dataset]].
 == Related papers ==
-An interesting related paper is [[RelatedPaper::Cohn, D. ICML 2000]] which proposes a latent variable model for citation
+An interesting related paper is [[RelatedPaper::Cohn, D. ICML 2000]] which proposes a latent variable model for citation.

Difference between revisions of "Cohn et al, Advances in Neural Information Processing Systems 2001"

Revision as of 20:54, 28 March 2011

Contents

Citation

Online version

Summary

Brief description of the method

Experimental Result

Related papers

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools