Madsen et al Modeling Word Burstiness

From Cohen Courses
Revision as of 02:09, 27 September 2012 by Nkatariy (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

author = {Madsen, Rasmus E. and Kauchak, David and Elkan, Charles},
title = {Modeling word burstiness using the Dirichlet distribution},
booktitle = {Proceedings of the 22nd international conference on Machine learning},
series = {ICML '05},
year = {2005},
isbn = {1-59593-180-5},
location = {Bonn, Germany},
pages = {545--552},
numpages = {8},
url = {http://doi.acm.org/10.1145/1102351.1102420},
doi = {10.1145/1102351.1102420},
acmid = {1102420},
publisher = {ACM},
address = {New York, NY, USA},

Online Version

Modeling Word Burstiness

Summary

This paper addresses the problems of document modeling and word-burstiness

Multinomial model

This model allows you to control the length of the document. The generation of a document can be viewed as a sequence of steps:

1. You choose the number of words in the document by drawing from a distribution

2. Now a die with sides and the probability of landing on side being (all add up to 1) is taken. is then rolled to decide the word in each position in the document where Wsize of the vocabulary.

3. Step 2 is repeated times to get the document.

Burstiness

Tries to capture the notion that you should be less surprised about the occurrence of a word in a document for the second time than when it appeared for the first time. In other words, even if the probability of occurrence of a word is low, it should increase with the number of times it has already occurred in the document.

Dirichlet Modeling

This framework basically lends an additional degree of freedom to the model which allows it to model burstiness. The vector becomes the parameter of the model and the are designed to follow a dirichlet() distribution. The degree of freedom is in the fact that the s do not need to add to 1 like the s in the multinomial. It is important to note that this does not model the individual burstiness of each word but only the overall burstiness of all words in the vocabulary.

The are trained using the standard method of maximizing the log-likelihood of the data. An iterative gradient ascent method is used to estimate the parameters.

Experiments

The Dirichlet model is compared with the standard multinomial model as well as heuristically modified versions of the multinomial model. The authors use three standard corpora namely Industry Sector, 20 Newsgroups and Reuters-21578 for their experiments.

Study Plan

This was a simple but interesting standalone paper to read. Not much background was needed. Following may still help