Project Sriram

From Cohen Courses
Revision as of 18:05, 1 February 2011 by Ssomanch (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Fast Learning of Graph Structure for Anomalous Pattern Detection

Team Members

Sriram Somanchi (somanchi@cmu.edu)

Looking for a team mate.

Introduction

Information diffusion or disease propagation often happens over an unknown network. Only thing we observe directly is nodes getting affected. We want to investigate a method to learn the underlying graph structure that best represents a given time series of data for each node. The learnt graph structure should have better performance when used as input to graph-based anomaly detection algorithms like GraphScan[1]. Previously we have proposed preliminary approach based on thresholds of correlation coefficient between two nodes for learning the graph in disease surveillance domain. However, in this work we would like to investigate new approaches and possible improvements to the previous method. Using the book marking dataset we would like to investigate the subset of websites that are most anomalous connected subsets that are emerging in the dataset. The anomalousness or interestingness is defined by the number of book marks that the web site receives compared to the past book markings.

Dataset

Currently we have Delicious book marking data set from Prof. William Cohen, which has one year worth of book marking on each day of the year 2007. We have the date; userid;website; tag tuple. This is a very large dataset, and we need to investigate methods to extract the right set of tuples from the data to be used in our method.

Proposed Work

We want to learn the graph structure over the web pages as nodes and identify the emerging web pages that are book marked higher than normal. There are couple of ideas that we wanted to investigate in this work. Firstly, we want to investigate about the problem of identifying the most interesting connected subset, where the connectivity is assumed by the relationship between the category or tag of the web pages. Second, instead of assuming the connectivity based on category, we would like to learn the connectivity between the web pages based on book marking process by users.

Related Work

Several heuristic methods for network structure learning are proposed like estimating dependency structure of directed graphical models and probabilistic relational models([2]). Also, there have been many link prediction algorithms ([3], [4], [5], [6]), which assume that there is network already existing and estimate how network looks in future based on the current structure. However, it is different from our case as we do not assume any graph for predicting the future graphs. Work that is closely related to our work have been algorithms like NetInf, ConNIe ([7], [8]) proposed very recently. NetInf, is an approximation algorithm for predicting the latent network and assumes all connected nodes influence their neighbors with equal probability. ConNIe is generalization of 1 this with out this assumption and faster in finding optimal latent network. However, our goal here to identify the graph structure which performs well for detecting anomalous connected subset rather than best representing the underlying graph structure. One of our future goals is to compare our proposed method with these algorithms both in terms of accuracy of detecting (precision and recall measures) and in terms of performance in detecting the emerging subsets faster (number of days of data to detect with a fixed false positive rate).

References

1. S. Speakman, E. McFowland III and D.B. Neill (2010) Scalable Detection of Anomalous Patterns with Connectivity Constraints. Technical Report

2. L. Getoor, N. Friedman, D. Koller, and B. Taskar (2003) Learning probabilistic models of link structure. JMLR, 3:707.

3. R. Jansen, H. Yu, D. Greenbaum, et al. (2003) A bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 302(5644):449-453.

4. D. Liben-Nowell and J. Kleinberg (2003) The link prediction problem for social networks. CIKM 03, pages 556-559.

5. B. Taskar, M. F. Wong, P. Abbeel, and D. Koller (2003) Link prediction in relational data. Neural Information Processing Systems (NIPS).

6. J. Vert and Y. Yamanishi (2005) Supervised graph inference. Neural Information Processing Systems (NIPS).

7. M. Gomez-Rodriguez, J. Leskovec, and A. Krause (2010) Inferring networks of diffusion and influence. KDD 10.

8. S. A. Myers and J. Leskovec (2010) On the Convexity of Latent Social Network Inference. Neural Information Processing Systems (NIPS).