Madsen et al Modeling Word Burstiness

Citation

author = {Madsen, Rasmus E. and Kauchak, David and Elkan, Charles},
title = {Modeling word burstiness using the Dirichlet distribution},
booktitle = {Proceedings of the 22nd international conference on Machine learning},
series = {ICML '05},
year = {2005},
isbn = {1-59593-180-5},
location = {Bonn, Germany},
pages = {545--552},
numpages = {8},
url = {http://doi.acm.org/10.1145/1102351.1102420},
doi = {10.1145/1102351.1102420},
acmid = {1102420},
publisher = {ACM},
address = {New York, NY, USA},

Online Version

Modeling Word Burstiness

Summary

This paper addresses the problems of document modeling and word-burstiness

Multinomial model

This model allows you to control the length of the document. The generation of a document can be viewed as a sequence of steps:

1. You choose the number of words $l_{doc}$ in the document by drawing from a distribution $Pr(L)$

2. Now a die with $|W|$ sides and the probability of landing on side $w$ being $\theta _{w}$ (all $\theta _{w}$ add up to 1) is taken. is then rolled to decide the word in each position in the document where W $=$ size of the vocabulary.

3. Step 2 is repeated $l_{doc}$ times to get the document.

Burstiness

Tries to capture the notion that you should be less surprised about the occurrence of a word in a document for the second time than when it appeared for the first time. In other words, even if the probability of occurrence of a word is low, it should increase with the number of times it has already occurred in the document.

Dirichlet Modeling

This framework basically lends an additional degree of freedom to the model which allows it to model burstiness. The $\alpha$ vector becomes the parameter of the model and the $\theta _{w}$ are designed to follow a dirichlet( $\alpha$ ) distribution. The degree of freedom is in the fact that the $\alpha _{i}$ s do not need to add to 1 like the $\theta _{i}$ s in the multinomial. It is important to note that this does not model the individual burstiness of each word but only the overall burstiness of all words in the vocabulary.

The $\alpha$ are trained using the standard method of maximizing the log-likelihood of the data. An iterative gradient ascent method is used to estimate the parameters.

Experiments

The Dirichlet model is compared with the standard multinomial model as well as heuristically modified versions of the multinomial model. The authors use three standard corpora namely Industry Sector, 20 Newsgroups and Reuters-21578 for their experiments.

Study Plan

This was a simple but interesting standalone paper to read. Not much background was needed. Following may still help

Madsen et al Modeling Word Burstiness

Contents

Citation

Online Version

Summary

Multinomial model

Burstiness

Dirichlet Modeling

Experiments

Study Plan

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools