Link Prediction in Relational Data

From Cohen Courses
Revision as of 04:55, 4 October 2012 by Zhua (talk | contribs) (→‎Study Plan)
Jump to navigationJump to search

Link Prediction in Relational Data

Citation

Ben Taskar and Ming-fai Wong and Pieter Abbeel and Daphne Koller, Link prediction in relational data, NIPS 2003

Online version

PDF

Summary

This paper focuses on Link Prediction and develops a framework which supports multiple link types and both link features and node features. The key idea is to use relational Markov network and to define the probabilistic patterns over subgraph structures for each application data sets to capture some type of feature.

Problem and Intuition

The problem is not exactly the traditional relationship prediction or recommendation over social network, but in a broader sense. Given some data in a relational format, say hyper-linked university web pages, the task can be to find who is whose adviser. This is compatible with the traditional link prediction problem, as every node feature can be mapped into a relational format. To predict whether a link exist, the information of both the two nodes and the link is not enough. For example, the fact that a professor and a student often show up in the same research project pages is a strong indicator. And this paper tries to use a subgraph structure to capture these kind of graph features in a relational Markov Network framework.

Relational Markov Network

be an undirected graph with a set of cliques . Each is associated with a set of nodes and a clique potential , which is a non-negative function defined on the joint domain of . The Markov net defines the distribution

To extend it to a relational setting, a relational Markov Network specifies a conditional distribution over all of the labels of all of the entities in an instantiation given the relational structure and the content attributes.

To specify what cliques should be constructed in an instantiation, we will define a notion of a relational clique template. A relational clique template specifies tuples of variables in the instantiation by using a relational query language.

For more details, please refer to B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. UAI, 2002.

Data Sets

The paper uses two data sets, as they are both collected by the authors and not shown publicly, there is no source to find them now.

  • The paper collected and manually labeled Computer Science department webpages from 3 schools: Stanford, Berkeley, and MIT.
  • A social network data set they collected by a portal website at a large university that hosts an online community for students.

Related Papers

  • This paper talks about the relational Markov Networks

B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. UAI, 2002.

  • This paper talks about a similar problem and also inspires the authors in the experiments for data collection

M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the world wide web. In Proc. AAAI, 1998.

  • This paper adopts a similar problem settings but deals with more noisy citation data

A. Popescul, R. Popescul, and L. H. Ungar. Statistical relational learning for link prediction, 2003.

Study Plan

And this talks about more on general random fields