Segmented Topic Model

From Cohen Courses
Jump to navigationJump to search

Segmented Topic Model is a new form of topic model which can take into account the inner structures in documents.

Basic Ideas

  • As in LDA, one document d has a multinomial distribution v(d) over latent topics
  • In this document, each segment d,s (sentence or paragraph) also has a multinomial distribution over topics. This distribution is generated from a two-parameter Poisson-Dirichlet process r(d,s)~ Poisson-Dirichlet(v(d),a,b)
  • The topic label of each word is drew from the topic distribution of its segment

Citation

A Segmented Topic Model based on the Two-Parameter Poisson-Dirichlet Process. Lan Du, Wray Buntine, Huidong Jin. In Machine Learning, Volume 81 Issue 1, Pages 5 - 19, 2010.