Difference between revisions of "Compare Yano et al NAACL 2009 Link PLSA LDA"

Revision as of 20:45, 1 December 2012

Papers

The papers are:

Comparison

Both of these papers are extensions of Link-LDA and use a blog dataset. Yano et al. tries to predict which user will comment on a blog posting whereas Nallapati and Cohen try to predict which blog will link to another blog.

Method

Datasets Used

Yano et al. uses a corpus of blog posts from 40 different blog sites focusing on American politics during from November 2007 to October 2008 (right up to a presidential election). Diversity in political leanings was emphasized for the final selection. Five blogs were chosen for the final selection.
Nallapati and Cohen also use a corpus of blogs, but these were collected from July 2004 - July 2005. Initially, it was a noisy dataset with lots of broken links and useless information. The authors constrained blogs used to have a minimum of 2 ingoing or 2 outgoing links within the corpora. Unlike Yano et al., there was no reliance on it being a specific blog site (as they only had 5).

Problem

Big Idea

Other

Questions

How much time did you spend reading the (new, non-wikified) paper you summarized? About 2 hours
How much time did you spend reading the old wikified paper? About 2 hours
How much time did you spend reading the summary of the old paper? About 15 min
How much time did you spend reading background material? N/A My final project for the class is on this area so I've read a lot of background papers
Was there a study plan for the old paper? Yes
1. if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them? I had actually read the papers before as it is directly related to my research with my advisor. I do a lot of Gibbs Sampling on graphical models (in particular topic-model derivatives) and that fits into the study plan
Give us any additional feedback you might have about this assignment. I like this comparison. It was a nice way to view the papers in a different light and really made it stick in my memory. In general, I like the wikifying and used it extensively for the project (and probably will use this for my research in the future after the class is over).

@@ Line 13: / Line 13: @@
 === Datasets Used ===
 * Yano et al. uses a corpus of blog posts from 40 different blog sites focusing on American politics during from November 2007 to October 2008 (right up to a presidential election). Diversity in political leanings was emphasized for the final selection. Five blogs were chosen for the final selection.
-* Nallapati and Cohen also use a corpus of blogs, but these were collected from July 2004 - July 2005. Initially, it was a noisy dataset with lots of broken links and useless information. The authors constrained blogs used to have a minimum of 2 ingoing or 2 outgoing links.
+* Nallapati and Cohen also use a corpus of blogs, but these were collected from July 2004 - July 2005. Initially, it was a noisy dataset with lots of broken links and useless information. The authors constrained blogs used to have a minimum of 2 ingoing or 2 outgoing links within the corpora. Unlike Yano et al., there was no reliance on it being a specific blog site (as they only had 5).
 === Problem ===

Difference between revisions of "Compare Yano et al NAACL 2009 Link PLSA LDA"

Revision as of 20:45, 1 December 2012

Contents

Papers

Comparison

Method

Datasets Used

Problem

Big Idea

Other

Questions

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools