Midterm Report Nitin Yandong Ming Yanbo

From Cohen Courses
Revision as of 19:12, 25 April 2011 by Yanbox (talk | contribs)
Jump to navigationJump to search

Team members

Nitin Agarwal

Yandong Liu

Yanbo Xu

Ming Sun

LDA results

  • Used ACL 2008 corpus for experimentation
  • For exploratory analysis of corpus we ran the LDA model
  • Parameters of the LDA model
    • Number of topics : 100
    • Gibbs iteration : 2000
    • Beta prior : 0.5
    • Alpha prior : 1.0

Some of the topics obtained post training

  • Error Detection (Topic 6)
    • errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
  • Evaluation (Topic 10)
    • evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
  • Entity Coreference (Topic 13)
    • names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
  • Parsing (Topic 18)
    • parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm


ATM results

Gibbs Sampling for Collaboration Influence Model

We want , the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:

We begin by calculating and :



,

where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r).

So the Gibbs sampling of  :



Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token:




Applications