Difference between revisions of "Compare Yano et al NAACL 2009 Link PLSA LDA"

Revision as of 21:32, 1 December 2012

Papers

The papers are:

Comparison

Both of these papers are extensions of Link-LDA and use a blog dataset. Yano et al. tries to predict which user will comment on a blog posting whereas Nallapati and Cohen try to predict which blog will link to another blog.

Method

Link-LDA is the basis for both papers and the graphical model is represented here:

Nallapti and Cohen extend it with this model:

The biggest difference is that this models the text of the cited documents as well. It is worth noting that the same priors $\Omega$ and $\beta$ are used for $w,w',d,$ and d'.

Yano et al. extend Link-LDA this way:

$u$ means that a user commented on the blog posting. $w'$ is the words in the comment. The extension from Link-LDA is that the words in the comment are also modeled (not just who will comment).

The differences between the two models is shown below (highlighted in Orange). Note that $\beta '$ is modeling the words in comments and is a different hyper-parameter and prior than the text in the blogs. Link-PLSA-LDA does not do this because they are only modeling the words in blog posts.

Datasets Used

Yano et al. uses a corpus of blog posts from 40 different blog sites focusing on American politics during from November 2007 to October 2008 (right up to a presidential election). Diversity in political leanings was emphasized for the final selection. Five blogs were chosen for the final selection.
Nallapati and Cohen also use a corpus of blogs, but these were collected from July 2004 - July 2005. Initially, it was a noisy dataset with lots of broken links and useless information. The authors constrained blogs used to have a minimum of 2 ingoing or 2 outgoing links within the corpora. Unlike Yano et al., there was no reliance on it being a specific blog site (as they only had 5).

Problem

Big Idea

Other

Questions

How much time did you spend reading the (new, non-wikified) paper you summarized? About 2 hours
How much time did you spend reading the old wikified paper? About 2 hours
How much time did you spend reading the summary of the old paper? About 15 min
How much time did you spend reading background material? N/A My final project for the class is on this area so I've read a lot of background papers
Was there a study plan for the old paper? Yes
1. if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them? I had actually read the papers before as it is directly related to my research with my advisor. I do a lot of Gibbs Sampling on graphical models (in particular topic-model derivatives) and that fits into the study plan
Give us any additional feedback you might have about this assignment. I like this comparison. It was a nice way to view the papers in a different light and really made it stick in my memory. In general, I like the wikifying and used it extensively for the project (and probably will use this for my research in the future after the class is over).

@@ Line 26: / Line 26: @@
 <math>u</math> means that a user commented on the blog posting. <math>w'</math> is the words in the comment. The extension from [[Mixed_membership_models_of_scientiﬁc_publication | Link-LDA]] is that the words in the comment are also modeled (not just who will comment).
+The differences between the two models is shown below (highlighted in Orange). Note that <math>\beta'</math> is modeling the words in comments and is a different hyper-parameter and prior than the text in the blogs. Link-PLSA-LDA does not do this because they are only modeling the words in blog posts.
+[[File:link_differences.png]]
 === Datasets Used ===

Difference between revisions of "Compare Yano et al NAACL 2009 Link PLSA LDA"

Revision as of 21:32, 1 December 2012

Contents

Papers

Comparison

Method

Datasets Used

Problem

Big Idea

Other

Questions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools