Difference between revisions of "Midterm Report Nitin Yandong Ming Yanbo"

From Cohen Courses
Jump to navigationJump to search
 
(3 intermediate revisions by 2 users not shown)
Line 9: Line 9:
  
 
== LDA results ==
 
== LDA results ==
 +
* Used ACL 2008 corpus for experimentation
 +
* For exploratory analysis of corpus we ran the LDA model
 +
* Parameters of the LDA model
 +
** Number of topics : 100
 +
** Gibbs iteration : 2000
 +
** Beta prior : 0.5
 +
** Alpha prior : 1.0
 +
 +
Some of the topics obtained post training
 +
 +
* '''Error Detection (Topic 6)'''
 +
** errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
 +
* '''Evaluation (Topic 10)'''
 +
** evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
 +
* '''Entity Coreference (Topic 13)'''
 +
** names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
 +
* '''Parsing (Topic 18)'''
 +
** parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm 
 +
  
 
== ATM results ==
 
== ATM results ==
  
 
== Gibbs Sampling for Collaboration Influence Model ==
 
== Gibbs Sampling for Collaboration Influence Model ==
 +
  
 
We want <math>P(Z,X,R|W)</math>, the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:
 
We want <math>P(Z,X,R|W)</math>, the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:
Line 45: Line 65:
  
 
<math>P(x_i,r_i | Z,X_{-i},W,R_{-i}) \propto \frac{n_{x_i, r_i}^{z_i} +\alpha_{z_i}}{\sum_{z'} n_{x_i,r_i}^{z'} + \alpha_{z'}} \frac{n_{r_i} + \eta_{r_i}}{\sum_{r_i} (n_{r_i} + \eta_{r_i})}</math>
 
<math>P(x_i,r_i | Z,X_{-i},W,R_{-i}) \propto \frac{n_{x_i, r_i}^{z_i} +\alpha_{z_i}}{\sum_{z'} n_{x_i,r_i}^{z'} + \alpha_{z'}} \frac{n_{r_i} + \eta_{r_i}}{\sum_{r_i} (n_{r_i} + \eta_{r_i})}</math>
 +
 +
 +
== Applications ==
 +
 +
<math>P(w_v|z) = \frac{n_{z}^{w_v} + \beta_v}{\sum_{v'} n_{z}^{w_{v'}} + \beta_{v'}}</math>
 +
 +
 +
<math>P(z|a,a',r) = \frac{n_{a,a',r}^{z} +\alpha_{z}}{\sum_{z'} n_{a,a',r}^{z'} + \alpha_{z'}} \frac{n_{r} + \eta_{r}}{\sum_{r'} (n_{r'} + \eta_{r'})}</math>
 +
 +
 +
<math>P(a,a',r|z) \propto P(z|a,a',r) P(a,a',r)</math>
 +
 +
 +
<math>P(a'|r,a,z) \propto \frac{P(a,a',r|z)}{P(r,a|z)} \propto P(a,a',r|z)</math>
 +
 +
 +
<math>P(z|a) = \frac{\sum_{a',r} P(z|a,a',r)P(a,a',r)}{\sum_{a',r,z} P(z|a,a',r)P(a,a',r)}</math>

Latest revision as of 20:26, 25 April 2011

Team members

Nitin Agarwal

Yandong Liu

Yanbo Xu

Ming Sun

LDA results

  • Used ACL 2008 corpus for experimentation
  • For exploratory analysis of corpus we ran the LDA model
  • Parameters of the LDA model
    • Number of topics : 100
    • Gibbs iteration : 2000
    • Beta prior : 0.5
    • Alpha prior : 1.0

Some of the topics obtained post training

  • Error Detection (Topic 6)
    • errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
  • Evaluation (Topic 10)
    • evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
  • Entity Coreference (Topic 13)
    • names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
  • Parsing (Topic 18)
    • parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm


ATM results

Gibbs Sampling for Collaboration Influence Model

We want , the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:

We begin by calculating and :



,

where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r).

So the Gibbs sampling of  :



Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token:




Applications