Sparse Additive Generative Models of Text

From Cohen Courses
Revision as of 08:01, 4 October 2012 by Kwmurray (talk | contribs)
Jump to navigationJump to search

This Paper is available online [1].

Summary

Sparse Additive Generative Models of Text, or SAGE, is an interesting alternative to traditional generative models for text. The key insight of the paper is that you can model latent classes or topics as a deviation in log-frequency from a constant background distribution. It has the advantage of enforcing sparsity which the authors argue prevents over-fitting. Additionally, generative facets can be combined through addition in the log space, avoiding the need for switching variables.

Datasets

Methodology

The key insight of this paper is that a generative model can be thought of as a deviation from a background distribution in log-space. The authors propose their method to deal with a few main problems that they see in the Dirichlet-multinomial generative models: Overfitting, Overparametrization, Inference Cost, and Lack of Sparsity.

Sage.png

Experimental Results