Difference between revisions of "Compare Yano et al NAACL 2009 Link PLSA LDA"
From Cohen Courses
Jump to navigationJump to search (→Method) |
|||
Line 19: | Line 19: | ||
[[File:link_plsa_lda_model.png]] | [[File:link_plsa_lda_model.png]] | ||
− | and Yano et al. extend | + | The biggest difference is that this models the text of the cited documents as well. It is worth noting that the same priors <math>\Omega</math> and <math>\Beta</math> are used for <math>w, w', d,</math> and <math>d'</math>. |
+ | |||
+ | Yano et al. extend [[Mixed_membership_models_of_scientific_publication | Link-LDA]] this way: | ||
[[File:Comment_LDA.png]] | [[File:Comment_LDA.png]] | ||
+ | |||
+ | <math>u</math> means that a user commented on the blog posting. <math>w'</math> is the words in the comment. The extension from [[Mixed_membership_models_of_scientific_publication | Link-LDA]] is that the words in the comment are also modeled (not just who will comment). | ||
=== Datasets Used === | === Datasets Used === |
Revision as of 21:00, 1 December 2012
Contents
Papers
The papers are:
Comparison
Both of these papers are extensions of Link-LDA and use a blog dataset. Yano et al. tries to predict which user will comment on a blog posting whereas Nallapati and Cohen try to predict which blog will link to another blog.
Method
Link-LDA is the basis for both papers and the graphical model is represented here:
Nallapti and Cohen extend it with this model:
The biggest difference is that this models the text of the cited documents as well. It is worth noting that the same priors and are used for and .
Yano et al. extend Link-LDA this way:
means that a user commented on the blog posting. is the words in the comment. The extension from Link-LDA is that the words in the comment are also modeled (not just who will comment).
Datasets Used
- Yano et al. uses a corpus of blog posts from 40 different blog sites focusing on American politics during from November 2007 to October 2008 (right up to a presidential election). Diversity in political leanings was emphasized for the final selection. Five blogs were chosen for the final selection.
- Nallapati and Cohen also use a corpus of blogs, but these were collected from July 2004 - July 2005. Initially, it was a noisy dataset with lots of broken links and useless information. The authors constrained blogs used to have a minimum of 2 ingoing or 2 outgoing links within the corpora. Unlike Yano et al., there was no reliance on it being a specific blog site (as they only had 5).
Problem
Big Idea
Other
Questions
- How much time did you spend reading the (new, non-wikified) paper you summarized? About 2 hours
- How much time did you spend reading the old wikified paper? About 2 hours
- How much time did you spend reading the summary of the old paper? About 15 min
- How much time did you spend reading background material? N/A My final project for the class is on this area so I've read a lot of background papers
- Was there a study plan for the old paper? Yes
- if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them? I had actually read the papers before as it is directly related to my research with my advisor. I do a lot of Gibbs Sampling on graphical models (in particular topic-model derivatives) and that fits into the study plan
- Give us any additional feedback you might have about this assignment. I like this comparison. It was a nice way to view the papers in a different light and really made it stick in my memory. In general, I like the wikifying and used it extensively for the project (and probably will use this for my research in the future after the class is over).