Link Prediction in Relational Data
Link Prediction in Relational Data
Contents
Citation
Ben Taskar and Ming-fai Wong and Pieter Abbeel and Daphne Koller, Link prediction in relational data, NIPS 2003
Online version
Summary
This paper focuses on Link Prediction and develops a framework which supports multiple link types and both link features and node features. The key idea is to use relational Markov network and to define the probabilistic patterns over subgraph structures for each application data sets to capture some type of feature.
Problem and Intuition
The problem is not exactly the traditional relationship prediction or recommendation over social network, but in a broader sense. Given some data in a relational format, say hyper-linked university web pages, the task can be to find who is whose adviser. This is compatible with the traditional link prediction problem, as every node feature can be mapped into a relational format. To predict whether a link exist, the information of both the two nodes and the link is not enough. For example, the fact that a professor and a student often show up in the same research project pages is a strong indicator. And this paper tries to use a subgraph structure to capture these kind of graph features in a relational Markov Network framework.
Relational Markov Network
be an undirected graph with a set of cliques . Each is associated with a set of nodes and a clique potential , which is a non-negative function defined on the joint domain of . The Markov net defines the distribution
To extend it to a relational setting, a relational Markov Network specifies a conditional distribution over all of the labels of all of the entities in an instantiation given the relational structure and the content attributes.
To specify what cliques should be constructed in an instantiation, we will define a notion of a relational clique template. A relational clique template specifies tuples of variables in the instantiation by using a relational query language.
For more details, please refer to B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In Proc. UAI, 2002.
Data Sets
The paper uses two data sets, as they are both collected by the authors and not shown publicly, there is no source to find them now.
- The paper collected and manually labeled Computer Science department webpages from 3 schools: Stanford, Berkeley, and MIT.
- A social network data set they collected by a portal website at a large university that hosts an online community for students.
Related Papers
- This paper talks about the relational Markov Networks
- This paper talks about a similar problem and also inspires the authors in the experiments for data collection
- This paper adopts a similar problem settings but deals with more noisy citation data
Study Plan
- To understand Relational Markov Networks
And this talks about more on general random fields