Group-based classification model

From Cohen Courses
Jump to navigationJump to search

This approach considers each group as a feature in a classifier. While some groups may be useful in inferring the sensitive attribute, a problem in many of the datasets that we encountered was that users were members of a very large number of groups, so identifying which groups are likely to be predictive is a key.

The group-based classification approach contains three main steps as Algorithm 1 shows. In the first step, the algorithm performs feature selection: it selects the groups that are relevant to the node classification task. This can either be done automatically or by a domain expert. Ideally, when the number of groups is high, the feature selection should be automated. For example, the function isRelevant(h) can return true if the entropy of group h is low. In the second step, the algorithm learns a global function f, e.g., trains a classifier, that takes the relevant groups of a node as features and returns the sensitive attribute value. This step uses only the nodes from the observed set whose sensitive attributes are known. Each node v is represented as a binary vector where each dimension corresponds to a unique group: {groupId : isMember}, v.a. Only memberships to relevant groups are considered and v.a is the class coming from a multinomial distribution which denotes the sensitiveattribute value. In the third step, the classifier returns the predicted sensitive attribute for each private profile.



Algorithm 1 Group-based classification model

 1. Set of relevant groups 
 2. for each group  do
       if  then
           
       end if
    end for
 3. 
 4. for each sensitive node  do
       
     end for