Difference between revisions of "Proposal 2nd Draft Nitin Yandong Ming Yanbo"

From Cohen Courses
Jump to navigationJump to search
Line 19: Line 19:
  
 
Essentially, we d like to capture the interactions and relationships between people. For academia, it s mainly about collaboration and citation. There are approaches about content Analysis and/or connectivity Analysis.
 
Essentially, we d like to capture the interactions and relationships between people. For academia, it s mainly about collaboration and citation. There are approaches about content Analysis and/or connectivity Analysis.
 
== Application ==
 
'''Who to collaborate with?'''
 
* Given a professor's name and his/her research topic, we want the computer to list the most possible researchers for him/her to collaborate.
 
* This can be stated as <math>a_r = argmax_{a_r^*}P(a_r^*,r=co-author|a,z)</math>
 
* Here, <math>a</math> means author of a paper, <math>a_r</math> could be either an co-author or an author in the reference. The role of <math>a_r</math> is decided by parameter <math>r</math>. The research topic is denoted by <math>z</math>.
 
'''Which work to cite?'''
 
* Given a research topic, we want the computer to tell us who is the most influential author in this area.
 
  
 
== Related work ==
 
== Related work ==
Line 47: Line 39:
  
 
For this model authors believe that nodes have different roles like in email data there are senders and receivers and they should be treated differently in the model. Therefore instead of modeling individuals, we model the pair relationship directly. An author and a set of recipients are observed. Topics are now conditioned on (author, recipient) pair.
 
For this model authors believe that nodes have different roles like in email data there are senders and receivers and they should be treated differently in the model. Therefore instead of modeling individuals, we model the pair relationship directly. An author and a set of recipients are observed. Topics are now conditioned on (author, recipient) pair.
 +
 +
As we can see a lot of previous work was either based on content analysis, or graph connectivity analysis. There is tremendously rich information hidden in the text so we'll go with topic model. We will derive a hybrid model that utilizes knowledge of both kinds. Similarly, we model the pair relationship directly, such as (author, author) or (author, citation)
 +
 +
== Application ==
 +
'''Who to collaborate with?'''
 +
* Given a professor's name and his/her research topic, we want the computer to list the most possible researchers for him/her to collaborate.
 +
* This can be stated as <math>a_r = argmax_{a_r^*}P(a_r^*,r=co-author|a,z)</math>
 +
* In this way, we can recommend a faculty member for you to collaborate with.
 +
'''Bold text'''

Revision as of 19:24, 15 February 2011

Modeling Academic Collaboration and Influence in scholarly literature

Team members

Nitin Agarwal

Yandong Liu

Yanbo Xu

Ming Sun

The Problem

New research papers are growing rapidly, especially in computer science field, making it hard to follow. Instead of wasting time reading all the papers, we want our computers to answer following questions:

  • Who to collaborate with?
  • Which work to cite?
  • Who to review this paper (for conference organizers)?

Essentially, we d like to capture the interactions and relationships between people. For academia, it s mainly about collaboration and citation. There are approaches about content Analysis and/or connectivity Analysis.

Related work

Author Topic Model

Author topic.png

Author-Topic model describes such a generative process about how each document is generated:

For each document:

  • Choose an author
  • Choose a topic
  • Choose a word

The result obtained includes the topic distribution per each author, and word distribution per each topic. One possible application suggested by this paper is to find related authors by computing KL-divergence of different author's topic distribution.

Author-Recipient-Topic Model

Author recipient topic.png

For this model authors believe that nodes have different roles like in email data there are senders and receivers and they should be treated differently in the model. Therefore instead of modeling individuals, we model the pair relationship directly. An author and a set of recipients are observed. Topics are now conditioned on (author, recipient) pair.

As we can see a lot of previous work was either based on content analysis, or graph connectivity analysis. There is tremendously rich information hidden in the text so we'll go with topic model. We will derive a hybrid model that utilizes knowledge of both kinds. Similarly, we model the pair relationship directly, such as (author, author) or (author, citation)

Application

Who to collaborate with?

  • Given a professor's name and his/her research topic, we want the computer to list the most possible researchers for him/her to collaborate.
  • This can be stated as
  • In this way, we can recommend a faculty member for you to collaborate with.

Bold text