Difference between revisions of "Segmented Topic Model"
From Cohen Courses
Jump to navigationJump to search (Created page with 'Segmented Topic Model is a new form of topic model which can take into account the inner structures in documents. The basic ideas are: * As in LDA, one document ''d'' has a mult…') |
|||
Line 1: | Line 1: | ||
− | Segmented Topic Model is a new form of topic model which can take into account the inner structures in documents. | + | Segmented Topic Model is a new form of topic model which can take into account the inner structures in documents. |
+ | |||
+ | == Basic Ideas == | ||
* As in LDA, one document ''d'' has a multinomial distribution ''v(d)'' over latent topics | * As in LDA, one document ''d'' has a multinomial distribution ''v(d)'' over latent topics | ||
* In this document, each segment ''d,s'' (sentence or paragraph) also has a multinomial distribution over topics. This distribution is generated from a two-parameter Poisson-Dirichlet process ''r(d,s)''~ Poisson-Dirichlet(''v(d),a,b'') | * In this document, each segment ''d,s'' (sentence or paragraph) also has a multinomial distribution over topics. This distribution is generated from a two-parameter Poisson-Dirichlet process ''r(d,s)''~ Poisson-Dirichlet(''v(d),a,b'') | ||
+ | * The topic label of each word is drew from the topic distribution of its segment | ||
+ | |||
+ | == Citation == | ||
+ | |||
+ | A Segmented Topic Model based on the Two-Parameter Poisson-Dirichlet Process. Lan Du, Wray Buntine, Huidong Jin. In Machine Learning, Volume 81 Issue 1, Pages 5 - 19, 2010. |
Latest revision as of 14:54, 29 September 2012
Segmented Topic Model is a new form of topic model which can take into account the inner structures in documents.
Basic Ideas
- As in LDA, one document d has a multinomial distribution v(d) over latent topics
- In this document, each segment d,s (sentence or paragraph) also has a multinomial distribution over topics. This distribution is generated from a two-parameter Poisson-Dirichlet process r(d,s)~ Poisson-Dirichlet(v(d),a,b)
- The topic label of each word is drew from the topic distribution of its segment
Citation
A Segmented Topic Model based on the Two-Parameter Poisson-Dirichlet Process. Lan Du, Wray Buntine, Huidong Jin. In Machine Learning, Volume 81 Issue 1, Pages 5 - 19, 2010.