Difference between revisions of "Midterm Report Nitin Yandong Ming Yanbo"

Latest revision as of 20:26, 25 April 2011

Team members

Nitin Agarwal

Yandong Liu

Yanbo Xu

Ming Sun

LDA results

Used ACL 2008 corpus for experimentation
For exploratory analysis of corpus we ran the LDA model
Parameters of the LDA model
- Number of topics : 100
- Gibbs iteration : 2000
- Beta prior : 0.5
- Alpha prior : 1.0

Some of the topics obtained post training

Error Detection (Topic 6)
- errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
Evaluation (Topic 10)
- evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
Entity Coreference (Topic 13)
- names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
Parsing (Topic 18)
- parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm

ATM results

Gibbs Sampling for Collaboration Influence Model

We want $P(Z,X,R|W)$ , the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:

$P(Z,X,R|W)={\frac {P(Z,X,R,W)}{\sum _{Z,X,R}P(Z,X,R,W)}}$

We begin by calculating $P(W|Z,X,R)$ and $P(Z,X,R)$ :

$P(W|Z,X,R)=P(W|Z)=\prod _{z=1}^{T}({\frac {\Gamma (\sum _{v=1}^{V}\beta _{v})}{\prod _{v=1}^{V}\Gamma (\beta _{v})}}({\frac {\prod _{v=1}^{V}\Gamma (n_{z}^{w_{v}}+\beta _{v})}{\Gamma (\sum _{v=1}^{V}\beta _{v}+\sum _{v=1}^{V}n_{z}^{w_{v}})}}))$

$P(Z,X,R)=(\prod _{i_{w}=1}^{W}{\frac {1}{n_{r_{i_{w}}}(a_{i_{w}})+\eta _{r_{i_{w}}}}})\prod _{p=1}^{P}({\frac {\Gamma (\sum _{z}\alpha _{z})}{\prod _{z=1}^{T}\Gamma (\alpha _{z})}}{\frac {\prod _{z}\Gamma (n_{p}^{z}+\alpha _{z})}{\Gamma (\sum _{z}\alpha _{z}+\sum _{z}n_{p}^{z})}})$ ,

where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r).

So the Gibbs sampling of $P(z_{i},x_{i},r_{i},w_{i}|Z_{-i},X_{-i},R_{-i},W_{-i})$ :

$P(z_{i},x_{i},r_{i},w_{i}|Z_{-i},X_{-i},R_{-i},W_{-i})$

$={\frac {P(Z,X,R,W)}{P(Z_{-i},X_{-i},R_{-i},W_{-i})}}$

$={\frac {1}{n_{r_{i}}+\eta _{r_{i}}}}{\frac {n_{p,-i}^{t}+\alpha _{t}}{\sum _{z}n_{p,-i}^{z}+\sum _{z}\alpha _{z}}}{\frac {n_{t,-i}^{w_{v}}+\beta _{v}}{\sum _{v}n_{t,-i}+\sum _{v}\beta _{v}}}$

Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token:

$P(z_{i}|Z_{-i},X,W,R)\propto {\frac {n_{z_{i}}^{w_{v}}+\beta _{v}}{\sum _{v}n_{z_{i}}^{w_{v}}+\beta _{v}}}{\frac {n_{x_{i}}^{z_{i}}+\alpha _{z_{i}}}{\sum _{z'}n_{x_{i}}^{z'}+\alpha _{z'}}}{\frac {n_{r_{i}}+\eta _{r_{i}}}{\sum _{r_{i}}(n_{r_{i}}+\eta _{r_{i}})}}$

$P(x_{i},r_{i}|Z,X_{-i},W,R_{-i})\propto {\frac {n_{x_{i},r_{i}}^{z_{i}}+\alpha _{z_{i}}}{\sum _{z'}n_{x_{i},r_{i}}^{z'}+\alpha _{z'}}}{\frac {n_{r_{i}}+\eta _{r_{i}}}{\sum _{r_{i}}(n_{r_{i}}+\eta _{r_{i}})}}$

Applications

$P(w_{v}|z)={\frac {n_{z}^{w_{v}}+\beta _{v}}{\sum _{v'}n_{z}^{w_{v'}}+\beta _{v'}}}$

$P(z|a,a',r)={\frac {n_{a,a',r}^{z}+\alpha _{z}}{\sum _{z'}n_{a,a',r}^{z'}+\alpha _{z'}}}{\frac {n_{r}+\eta _{r}}{\sum _{r'}(n_{r'}+\eta _{r'})}}$

$P(a,a',r|z)\propto P(z|a,a',r)P(a,a',r)$

$P(a'|r,a,z)\propto {\frac {P(a,a',r|z)}{P(r,a|z)}}\propto P(a,a',r|z)$

$P(z|a)={\frac {\sum _{a',r}P(z|a,a',r)P(a,a',r)}{\sum _{a',r,z}P(z|a,a',r)P(a,a',r)}}$

@@ Line 9: / Line 9: @@
 == LDA results ==
+* Used ACL 2008 corpus for experimentation
+* For exploratory analysis of corpus we ran the LDA model
+* Parameters of the LDA model
+** Number of topics : 100
+** Gibbs iteration : 2000
+** Beta prior : 0.5
+** Alpha prior : 1.0
+Some of the topics obtained post training
+* '''Error Detection (Topic 6)'''
+** errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
+* '''Evaluation (Topic 10)'''
+** evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
+* '''Entity Coreference (Topic 13)'''
+** names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
+* '''Parsing (Topic 18)'''
+** parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm
 == ATM results ==
 == Gibbs Sampling for Collaboration Influence Model ==
 We want <math>P(Z,X,R|W)</math>, the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:
@@ Line 18: / Line 38: @@
 <math>P(Z,X,R|W) = \frac{P(Z,X,R,W)}{\sum_{Z,X,R} P(Z,X,R,W)}</math>
-We begin by calculating <math>P(W|Z,X,R)</math>:
+We begin by calculating <math>P(W|Z,X,R)</math> and <math>P(Z,X,R)</math>:
 <math>P(W|Z,X,R) = P(W|Z) = \prod_{z = 1}^{T} (\frac{\Gamma (\sum_{v = 1}^{V} \beta_{v})}{\prod_{v=1}^{V} \Gamma (\beta_v)} ( \frac{\prod_{v=1}^{V} \Gamma (n_{z}^{w_v} + \beta_v)}{\Gamma (\sum_{v=1}^{V} \beta_v + \sum_{v=1}^{V} n_{z}^{w_v})}))</math>
-then,
-<math>P(Z,X,R) = ()()</math>
+<math>P(Z,X,R) = (\prod_{i_w = 1}^{W} \frac{1}{n_{r_{i_w}} (a_{i_w}) + \eta_{r_{i_w}}}) \prod_{p=1}^{P} (\frac{\Gamma (\sum_z \alpha_z)}{\prod_{z=1}^{T} \Gamma (\alpha_z)} \frac{\prod_z \Gamma (n_p^z + \alpha_z)}{\Gamma (\sum_z \alpha_z + \sum_z n_p^z)})</math>,
+where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r).
+So the Gibbs sampling of <math>P(z_i, x_i, r_i, w_i | Z_{-i}, X_{-i}, R_{-i}, W_{-i})</math> :
+<math>P(z_i, x_i, r_i, w_i | Z_{-i}, X_{-i}, R_{-i}, W_{-i})</math>
+<math>= \frac{P(Z,X,R,W)}{P(Z_{-i}, X_{-i}, R_{-i}, W_{-i})}</math>
+<math>= \frac{1}{n_{r_i} + \eta_{r_i}} \frac{n_{p,-i}^{t} + \alpha_t}{\sum_z n_{p,-i}^z + \sum_z \alpha_z} \frac{n_{t,-i}^{w_v} + \beta_v}{\sum_v n_{t,-i} + \sum_v \beta_v}</math>
+Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token:
+<math>P(z_i | Z_{-i}, X, W,R) \propto \frac{n_{z_i}^{w_v} + \beta_v}{\sum_v n_{z_i}^{w_v} + \beta_v} \frac{n_{x_i}^{z_i} + \alpha_{z_i}}{\sum_{z'} n_{x_i}^{z'} + \alpha_{z'}} \frac{n_{r_i} + \eta_{r_i}}{\sum_{r_i} (n_{r_i} + \eta_{r_i})}</math>
+<math>P(x_i,r_i | Z,X_{-i},W,R_{-i}) \propto \frac{n_{x_i, r_i}^{z_i} +\alpha_{z_i}}{\sum_{z'} n_{x_i,r_i}^{z'} + \alpha_{z'}} \frac{n_{r_i} + \eta_{r_i}}{\sum_{r_i} (n_{r_i} + \eta_{r_i})}</math>
+== Applications ==
+<math>P(w_v|z) = \frac{n_{z}^{w_v} + \beta_v}{\sum_{v'} n_{z}^{w_{v'}} + \beta_{v'}}</math>
+<math>P(z|a,a',r) = \frac{n_{a,a',r}^{z} +\alpha_{z}}{\sum_{z'} n_{a,a',r}^{z'} + \alpha_{z'}} \frac{n_{r} + \eta_{r}}{\sum_{r'} (n_{r'} + \eta_{r'})}</math>
+<math>P(a,a',r|z) \propto P(z|a,a',r) P(a,a',r)</math>
+<math>P(a'|r,a,z) \propto \frac{P(a,a',r|z)}{P(r,a|z)} \propto P(a,a',r|z)</math>
+<math>P(z|a) = \frac{\sum_{a',r} P(z|a,a',r)P(a,a',r)}{\sum_{a',r,z} P(z|a,a',r)P(a,a',r)}</math>

Difference between revisions of "Midterm Report Nitin Yandong Ming Yanbo"

Latest revision as of 20:26, 25 April 2011

Contents

Team members

LDA results

ATM results

Gibbs Sampling for Collaboration Influence Model

Applications

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools