Roth et al.: KDD 2010

From Cohen Courses
Jump to navigationJump to search

Citation

M. Roth, A. Ben-David, D. Deutscher, G. Flysher, I. Horn, A. Leichtberg, N. Leiser, Y. Matias, and R. Merom, "Suggesting Friends Using the Implicit Social Graph", Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2010

Online version

[1]

Summary

This paper aims at friend suggestion by identifying implicit social graph: A weighted graph formed by users' interactions with contacts and groups of contacts, edges determined by frequency, recency and direction of interactions. This paper introduces an interaction-based metric for estimating a user's affinity to his contacts and groups and defines a friend suggestion algorithm that generates a friend group, given a small seed set of contacts. The implicit graph used here is the egocentric network of email interactions among different people.

Interactions Rank: weights between a user and his implicit groups

To calculate the relationship strength between a user and a group of his implicit groups, frequency, recency, and direction of interaction are considered.


is a half-time tunable parameter representing exponential decay over time and can be tuned for the relative importance of outgoing versus incoming emails.

The Core Routine

The core routine to expand the seed is as follows:

Scoring Functions

This paper introduces multiple simple scoring functions to see their effects and performance. The first one is IntersectingGroupScore in which the Interactions Rank of all the contexts the contact exchanged emails with at least one seed group member is considered. The second scoring function is IntersectionWeightedScore in which the size of intersection of the contact's group with the seed group is multiplied by Interactions Rank. In IntersectingGroupCount, the number of groups in which contact belongs to and have a non-empty intersection with the seed is considered as score and in TopContactScore, sum of the scores of all of the groups the contact belongs to is returned as the score of each contact.


Data

They used 10’000 randomly sampled Gmail email interactions generated by active human users. They Sampled a few contacts from each group and measured the ability to recreate it. To train, they used the snapshot of user’s egocentric network based on earlier interactions. Two applications: "Don't forget Bob!" and "Got the wrong Bob?" are built based on this model and used by many users.


Evaluation

Evaluations show that IntersectionWeightedScore is the most successful method for scoring the contacts based on precision and recall. Also based on user feed back for the two applications "Don't forget Bob!" and "Got the wrong Bob?", they noticed that users are satisfied with the approach even if they don't bother to click on the suggestions and only type them in the recipients box.