Chang, Annals of Applied Statistics(AoAS) 2010
J. Chang and D. Blei, 2010. Hierarchical relational models for document networks. To appear in Annals of Applied Statistics, 2010
This paper presents a topic model that simultaneously models document text and the links (relations) between documents. The authors extend Latent Dirichlet Allocation(LDA), to enable links to be formed between documents. They call their model the relational topic model.
They ran their model on a subset of the Cora network, the Web KB dataset, PNAS citation data, and a dataset of local news, with geographical links. They observed that their model is better than LDA (with logistic regression) and pairwise link-LDA in predicting held out words and held out links.
There has been a lot of previous work in modeling links and text, including
- The paper by Nallapati KDD2008 models directed links by allowing links from citing documents to cited documents in a bipartite graph.
- The paper by Gruber UAI2008 models directed links by assuming that a link from a word w to a document d depends on the frequency of the topic of w in d, and the in-degree of d.
- The paper by Sinkkonen arxiv2008 models the entire network as a bag of links using an interaction component model, which can scale to very large graphs with 670,000 nodes and 1.89 million links.