|
|
Line 9: |
Line 9: |
| | | |
| == LDA results == | | == LDA results == |
| + | * Used ACL 2008 corpus for experimentation |
| + | * For exploratory analysis of corpus we ran the LDA model |
| + | * Parameters of the LDA model |
| + | ** Number of topics : 100 |
| + | ** Gibbs iteration : 2000 |
| + | ** Beta prior : 0.5 |
| + | ** Alpha prior : 1.0 |
| + | |
| + | Some of the topics obtained post training |
| + | |
| + | * '''Error Detection (Topic 6)''' |
| + | ** errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting |
| + | * '''Evaluation (Topic 10)''' |
| + | ** evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics |
| + | * '''Entity Coreference (Topic 13)''' |
| + | ** names, entity, named, entities, person, coreference, task, ne, recognition, proper, location |
| + | * '''Parsing (Topic 18)''' |
| + | ** parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm |
| + | |
| | | |
| == ATM results == | | == ATM results == |
Team members
Nitin Agarwal
Yandong Liu
Yanbo Xu
Ming Sun
LDA results
- Used ACL 2008 corpus for experimentation
- For exploratory analysis of corpus we ran the LDA model
- Parameters of the LDA model
- Number of topics : 100
- Gibbs iteration : 2000
- Beta prior : 0.5
- Alpha prior : 1.0
Some of the topics obtained post training
- Error Detection (Topic 6)
- errors, error, correct, rate, correction, spelling, detection, based, detect, types, detecting
- Evaluation (Topic 10)
- evaluation, human, performance, automatic, quality, evaluate, study, results, task, metrics
- Entity Coreference (Topic 13)
- names, entity, named, entities, person, coreference, task, ne, recognition, proper, location
- Parsing (Topic 18)
- parsing, parser, parse, grammar, parsers, parses, input, chart, partial, syntactic, parsed, algorithm
ATM results
Gibbs Sampling for Collaboration Influence Model
We want , the posterior distribution of topic Z, (author, collaborator) pair X and which favor of collaboration over influence R given the words W in the corpus:
We begin by calculating and :
,
where P is the number of all the different author-collaborator-favor of collaboration combination (a,a',r).
So the Gibbs sampling of :
Further manipulation can turn the above equation into update equations for the topic and author-collaboration of each corpus token: