Madsen et al Modeling Word Burstiness
Contents
Citation
author = {Madsen, Rasmus E. and Kauchak, David and Elkan, Charles}, title = {Modeling word burstiness using the Dirichlet distribution}, booktitle = {Proceedings of the 22nd international conference on Machine learning}, series = {ICML '05}, year = {2005}, isbn = {1-59593-180-5}, location = {Bonn, Germany}, pages = {545--552}, numpages = {8}, url = {http://doi.acm.org/10.1145/1102351.1102420}, doi = {10.1145/1102351.1102420}, acmid = {1102420}, publisher = {ACM}, address = {New York, NY, USA},
Online Version
Summary
This paper addresses the problems of document modeling and word-burstiness
Multinomial model
This model allows you to control the length of the document. The generation of a document can be viewed as a sequence of steps:
1. You choose the number of words in the document by drawing from a distribution
2. Now a die with sides and the probability of landing on side being (all add up to 1) is taken. is then rolled to decide the word in each position in the document where Wsize of the vocabulary.
3. Step 2 is repeated times to get the document.
Burstiness
Tries to capture the notion that you should be less surprised about the occurrence of a word in a document for the second time than when it appeared for the first time. In other words, even if the probability of occurrence of a word is low, it should increase with the number of times it has already occurred in the document.
Dirichlet Modeling
This framework basically lends an additional degree of freedom to the model which allows it to model burstiness. The vector becomes the parameter of the model and the are designed to follow a dirichlet() distribution. The degree of freedom is in the fact that the s do not need to add to 1 like the s in the multinomial. It is important to note that this does not model the individual burstiness of each word but only the overall burstiness of all words in the vocabulary.
The are trained using the standard method of maximizing the log-likelihood of the data. An iterative gradient ascent method is used to estimate the parameters.
Experiments
The Dirichlet model is compared with the standard multinomial model as well as heuristically modified versions of the multinomial model. The authors use three standard corpora namely Industry Sector, 20 Newsgroups and Reuters-21578 for their experiments.
Study Plan
This was a simple but interesting standalone paper to read. Not much background was needed. Following may still help