Difference between revisions of "Sparse Additive Generative Models of Text"

From Cohen Courses
Jump to navigationJump to search
Line 10: Line 10:
 
== Methodology ==
 
== Methodology ==
  
The key insight of this paper is that a generative model can be thought of as a deviation from a background distribution in log-space. The authors propose their method to deal with a few main problems that they see in the Dirichlet-multinomial generative models: [[AddressesProblem::Overfitting]], [[AddressesProblem::Overparametrization]], [[AddressesProblem::Inference Cost]], and [[AddressesProblem::Lack of Sparsity]].
+
The key insight of this paper is that a generative model can be thought of as a deviation from a background distribution in log-space. This has a few nice benefits. The authors propose their method to deal with a few main problems that they see in the Dirichlet-multinomial generative models: [[AddressesProblem::Overfitting]], [[AddressesProblem::Overparametrization]], [[AddressesProblem::Inference Cost]], and [[AddressesProblem::Lack of Sparsity]]. Overfitting is always a problem in Machine Learning and SAGE attempts to deal with the issue by imposing sparsity. They use an interesting technique of placing a zero-mean Laplace prior on the model. Interestingly, they claim that this has the same effect as an L1 regularizer, but do not actually explain how they are the same. This was the biggest issue with the paper. It does appear to enforce sparsity, but does not actually prove that it is the same as an L1 regularizer.
 +
 
 +
The paper also had an interesting take on inference costs. In particular, they talk about the incorporation of multiple generative facets in many model and argue that in most cases this requires an additional latent variable per token that acts as a switching variable. By adding log-probabilities, SAGE does not have the need for switching variables.
 +
 
 +
By modeling things with a background distribution, the authors also hope to deal with overparametrization so that high frequency tokens are automatically included in the background distribution and don't need to be learned. They give the example of the words "the" and "of" which in traditional classifiers have to have their probabilities be relearned for every class.
  
 
[[File:sage.png]]
 
[[File:sage.png]]
  
 
== Experimental Results ==
 
== Experimental Results ==

Revision as of 09:19, 4 October 2012

This Paper is available online [1].

Summary

Sparse Additive Generative Models of Text, or SAGE, is an interesting alternative to traditional generative models for text. The key insight of the paper is that you can model latent classes or topics as a deviation in log-frequency from a constant background distribution. It has the advantage of enforcing sparsity which the authors argue prevents over-fitting. Additionally, generative facets can be combined through addition in the log space, avoiding the need for switching variables.

Datasets

Methodology

The key insight of this paper is that a generative model can be thought of as a deviation from a background distribution in log-space. This has a few nice benefits. The authors propose their method to deal with a few main problems that they see in the Dirichlet-multinomial generative models: Overfitting, Overparametrization, Inference Cost, and Lack of Sparsity. Overfitting is always a problem in Machine Learning and SAGE attempts to deal with the issue by imposing sparsity. They use an interesting technique of placing a zero-mean Laplace prior on the model. Interestingly, they claim that this has the same effect as an L1 regularizer, but do not actually explain how they are the same. This was the biggest issue with the paper. It does appear to enforce sparsity, but does not actually prove that it is the same as an L1 regularizer.

The paper also had an interesting take on inference costs. In particular, they talk about the incorporation of multiple generative facets in many model and argue that in most cases this requires an additional latent variable per token that acts as a switching variable. By adding log-probabilities, SAGE does not have the need for switching variables.

By modeling things with a background distribution, the authors also hope to deal with overparametrization so that high frequency tokens are automatically included in the background distribution and don't need to be learned. They give the example of the words "the" and "of" which in traditional classifiers have to have their probabilities be relearned for every class.

Sage.png

Experimental Results