Zheleva and Getoor, WWW2009
Contents
Citation
Zheleva, E., and L. Getoor. 2009. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proceedings of the 18th international conference on World wide web, 531–540.
Online version
Summary
This paper applied some practical models that use friendship an group membership information do a relational classification problem in social networks. The key novel idea is that in addition to friendship links, groups can be carriers of significant information.
Methodology
* Attacks without links and groups
Link-based privacy attacks take advantage of autocorrelation, the property that the attribute values of linked objects are correlated. An example of autocorrelation is that people who are friends often share common characteristics (as in the proverb "Tell me who your friends are, and I'll tell you who you are").
- Friend-aggregate model (AGG)
The nodes and their links produce a graph structure in which one can identify circles of close friends. The AGG looks at the sensitive attribute distribution amongst the friends of the person under question. The adversary using this model picks the most probable attribute value (i.e., the mode of the friends' attribute distribution).
- Collective classification model (CC)
Collective classification also takes advantage of autocorrelation between linked objects. Unlike more traditional methods, in which each instance is classified independently of the rest, collective classification aims at learning and inferring class labels of linked objects together. So it makes use of not only the public profiles but also the inferred values for connected private profile. The approximating inference algorithm used in this paper is the iterative classification (ICA).
- Flat-link model (LINK)
This approach deals with links as flattening the data by considering the adjacency matrix of the graph. In this model, each row in the matrix is a user instance and each user has a list of binary features of the size of the network, while each feature has a value of 1 if the user is friends with the person who corresponds to this feature and 0 otherwise. The algorithms can be any traditional classifier, such as Naive Bayes, logistic regression or SVM.
- Blockmodeling attach (BLOCK)
Users form natural clusters or blocks, and their interactions can be explained by the blocks they belong to. In particular, the link probability between two users is the same as the link probability between their corresponding blocks. If sensitive attribute values separate users into blocks, then based on the observed interactions of a private-profile user with public-profile users, one can predict the most likely block the user belongs to and thus discover the attribute value. This model is similar to the class-distribution relational-neighbor classifier.
* Privacy attachks using groups
- Groupmate-link model (CLIQUE)
One can think of groupmates as friends to whom users are implicitly linked. So this model can assume that each group is a clique of friends, thus creating a friendship link between users who belong to at least one group together. The advantage of this model is that it simplifies the problem to a link-based classification problem, which has been studied more thoroughly. One of the disadvantages is that it doesn't account for the strength of the relationship between two people, e.g. number of common groups.
- Group-based classification model (GROUP)
This approach considers each group as a feature in a classifier. While some groups may be useful in inferring the sensitive attribute, a problem in many of the datasets that we encountered was that users were members of a very large number of groups, so identifying which groups are likely to be predictive is a key.
The group-based classification approach contains three main steps as Algorithm 1 shows. In the first step, the algorithm performs feature selection: it selects the groups that are relevant to the node classification task. This can either be done automatically or by a domain expert. Ideally, when the number of groups is high, the feature selection should be automated. For example, the function isRelevant(h) can return true if the entropy of group h is low. In the second step, the algorithm learns a global function f, e.g., trains a classifier, that takes the relevant groups of a node as features and returns the sensitive attribute value. This step uses only the nodes from the observed set whose sensitive attributes are known. Each node v is represented as a binary vector where each dimension corresponds to a unique group: {groupId : isMember}, v.a. Only memberships to relevant groups are considered and v.a is the class coming from a multinomial distribution which denotes the sensitiveattribute value. In the third step, the classifier returns the predicted sensitive attribute for each private profile.
Algorithm 1 Group-based classification model
1. Set of relevant groups 2. for each group do if then end if end for 3. 4. for each sensitive node do end for
- Privacy attacks using links and groups