Difference between revisions of "Midterm Report Nitin Yandong Ming Yanbo"
From Cohen Courses
Jump to navigationJump to search (Created page with '== Team members == Nitin Agarwal Yandong Liu Yanbo Xu Ming Sun == LDA results == == ATM results == == Gibbs Sampling for Co…') |
|||
(4 intermediate revisions by 2 users not shown) | |||
Line 9: | Line 9: | ||
== LDA results == | == LDA results == | ||
+ | * Used ACL 2008 corpus for experimentation | ||
+ | * For exploratory analysis of corpus we ran the LDA model | ||
+ | * Parameters of the LDA model | ||
+ | ** Number of topics : 100 | ||
+ | ** Gibbs iteration : 2000 | ||
+ | ** Beta prior : 0.5 | ||
+ | ** Alpha prior : 1.0 | ||
+ | |||
+ | Some of the topics obtained post training | ||
+ | |||
+ | * '''Error Detection (Topic 6)''' | ||
+ | ** errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting | ||
+ | * '''Evaluation (Topic 10)''' | ||
+ | ** evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics | ||
+ | * '''Entity Coreference (Topic 13)''' | ||
+ | ** names, entity, named, entities, person, coreference, task, ne, recognition, proper, location | ||
+ | * '''Parsing (Topic 18)''' | ||
+ | ** parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm | ||
+ | |||
== ATM results == | == ATM results == | ||
== Gibbs Sampling for Collaboration Influence Model == | == Gibbs Sampling for Collaboration Influence Model == | ||
+ | |||
We want <math>P(Z,X,R|W)</math>, the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus: | We want <math>P(Z,X,R|W)</math>, the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus: | ||
Line 18: | Line 38: | ||
<math>P(Z,X,R|W) = \frac{P(Z,X,R,W)}{\sum_{Z,X,R} P(Z,X,R,W)}</math> | <math>P(Z,X,R|W) = \frac{P(Z,X,R,W)}{\sum_{Z,X,R} P(Z,X,R,W)}</math> | ||
− | We begin by calculating <math>P(W|Z,X,R)</math>: | + | We begin by calculating <math>P(W|Z,X,R)</math> and <math>P(Z,X,R)</math>: |
+ | |||
<math>P(W|Z,X,R) = P(W|Z) = \prod_{z = 1}^{T} (\frac{\Gamma (\sum_{v = 1}^{V} \beta_{v})}{\prod_{v=1}^{V} \Gamma (\beta_v)} ( \frac{\prod_{v=1}^{V} \Gamma (n_{z}^{w_v} + \beta_v)}{\Gamma (\sum_{v=1}^{V} \beta_v + \sum_{v=1}^{V} n_{z}^{w_v})}))</math> | <math>P(W|Z,X,R) = P(W|Z) = \prod_{z = 1}^{T} (\frac{\Gamma (\sum_{v = 1}^{V} \beta_{v})}{\prod_{v=1}^{V} \Gamma (\beta_v)} ( \frac{\prod_{v=1}^{V} \Gamma (n_{z}^{w_v} + \beta_v)}{\Gamma (\sum_{v=1}^{V} \beta_v + \sum_{v=1}^{V} n_{z}^{w_v})}))</math> | ||
− | |||
− | <math>P(Z,X,R) = ()()</math> | + | <math>P(Z,X,R) = (\prod_{i_w = 1}^{W} \frac{1}{n_{r_{i_w}} (a_{i_w}) + \eta_{r_{i_w}}}) \prod_{p=1}^{P} (\frac{\Gamma (\sum_z \alpha_z)}{\prod_{z=1}^{T} \Gamma (\alpha_z)} \frac{\prod_z \Gamma (n_p^z + \alpha_z)}{\Gamma (\sum_z \alpha_z + \sum_z n_p^z)})</math>, |
+ | |||
+ | where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r). | ||
+ | |||
+ | So the Gibbs sampling of <math>P(z_i, x_i, r_i, w_i | Z_{-i}, X_{-i}, R_{-i}, W_{-i})</math> : | ||
+ | |||
+ | |||
+ | <math>P(z_i, x_i, r_i, w_i | Z_{-i}, X_{-i}, R_{-i}, W_{-i})</math> | ||
+ | |||
+ | <math>= \frac{P(Z,X,R,W)}{P(Z_{-i}, X_{-i}, R_{-i}, W_{-i})}</math> | ||
+ | |||
+ | <math>= \frac{1}{n_{r_i} + \eta_{r_i}} \frac{n_{p,-i}^{t} + \alpha_t}{\sum_z n_{p,-i}^z + \sum_z \alpha_z} \frac{n_{t,-i}^{w_v} + \beta_v}{\sum_v n_{t,-i} + \sum_v \beta_v}</math> | ||
+ | |||
+ | |||
+ | Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token: | ||
+ | |||
+ | |||
+ | <math>P(z_i | Z_{-i}, X, W,R) \propto \frac{n_{z_i}^{w_v} + \beta_v}{\sum_v n_{z_i}^{w_v} + \beta_v} \frac{n_{x_i}^{z_i} + \alpha_{z_i}}{\sum_{z'} n_{x_i}^{z'} + \alpha_{z'}} \frac{n_{r_i} + \eta_{r_i}}{\sum_{r_i} (n_{r_i} + \eta_{r_i})}</math> | ||
+ | |||
+ | |||
+ | <math>P(x_i,r_i | Z,X_{-i},W,R_{-i}) \propto \frac{n_{x_i, r_i}^{z_i} +\alpha_{z_i}}{\sum_{z'} n_{x_i,r_i}^{z'} + \alpha_{z'}} \frac{n_{r_i} + \eta_{r_i}}{\sum_{r_i} (n_{r_i} + \eta_{r_i})}</math> | ||
+ | |||
+ | |||
+ | == Applications == | ||
+ | |||
+ | <math>P(w_v|z) = \frac{n_{z}^{w_v} + \beta_v}{\sum_{v'} n_{z}^{w_{v'}} + \beta_{v'}}</math> | ||
+ | |||
+ | |||
+ | <math>P(z|a,a',r) = \frac{n_{a,a',r}^{z} +\alpha_{z}}{\sum_{z'} n_{a,a',r}^{z'} + \alpha_{z'}} \frac{n_{r} + \eta_{r}}{\sum_{r'} (n_{r'} + \eta_{r'})}</math> | ||
+ | |||
+ | |||
+ | <math>P(a,a',r|z) \propto P(z|a,a',r) P(a,a',r)</math> | ||
+ | |||
+ | |||
+ | <math>P(a'|r,a,z) \propto \frac{P(a,a',r|z)}{P(r,a|z)} \propto P(a,a',r|z)</math> | ||
+ | |||
+ | |||
+ | <math>P(z|a) = \frac{\sum_{a',r} P(z|a,a',r)P(a,a',r)}{\sum_{a',r,z} P(z|a,a',r)P(a,a',r)}</math> |
Latest revision as of 19:26, 25 April 2011
Contents
Team members
Nitin Agarwal
LDA results
- Used ACL 2008 corpus for experimentation
- For exploratory analysis of corpus we ran the LDA model
- Parameters of the LDA model
- Number of topics : 100
- Gibbs iteration : 2000
- Beta prior : 0.5
- Alpha prior : 1.0
Some of the topics obtained post training
- Error Detection (Topic 6)
- errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
- Evaluation (Topic 10)
- evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
- Entity Coreference (Topic 13)
- names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
- Parsing (Topic 18)
- parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm
ATM results
Gibbs Sampling for Collaboration Influence Model
We want , the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:
We begin by calculating and :
,
where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r).
So the Gibbs sampling of :
Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token:
Applications