Group-based classification model

This approach considers each group as a feature in a classifier. While some groups may be useful in inferring the sensitive attribute, a problem in many of the datasets that we encountered was that users were members of a very large number of groups, so identifying which groups are likely to be predictive is a key.

The group-based classification approach contains three main steps as Algorithm 1 shows. In the first step, the algorithm performs feature selection: it selects the groups that are relevant to the node classification task. This can either be done automatically or by a domain expert. Ideally, when the number of groups is high, the feature selection should be automated. For example, the function isRelevant(h) can return true if the entropy of group h is low. In the second step, the algorithm learns a global function f, e.g., trains a classifier, that takes the relevant groups of a node as features and returns the sensitive attribute value. This step uses only the nodes from the observed set whose sensitive attributes are known. Each node v is represented as a binary vector where each dimension corresponds to a unique group: {groupId : isMember}, v.a. Only memberships to relevant groups are considered and v.a is the class coming from a multinomial distribution which denotes the sensitiveattribute value. In the third step, the classifier returns the predicted sensitive attribute for each private profile.

Algorithm 1 Group-based classification model

 1. Set of relevant groups  $H_{relevant}=0$ 
 2. for each group  $h\in H$  do
       if  $isRelevant(h)$  then
            $H_{relevent}=H_{relevant}\cup {h}$ 
       end if
    end for
 3.  $trainClasifier(f,V_{0},H_{relevant})$ 
 4. for each sensitive node  $v\in V_{s}$  do
        $v.a=f(v.H_{relevant})$ 
     end for

Group-based classification model

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools