Project Draft 1 - diliu, dperciva

From Cohen Courses
Jump to navigationJump to search

Project Team

Di Liu

Daniel Percival

Introduction

We propose to explore two aspects of a large collection of SMS (text message) data. First, we will explore methods to measure the reciprocity of the ties, taking into account both time and perhaps length of the text message. Second, we will focus on abnormal behavior detection. These data consist of both phone call and SMS records have been used previously in several studies, mostly focused on the phone records.

Dataset

The dataset was collected by an anonymous mobile phone operator in the 6 month period between December 1, 2007 and May 31, 2008. The data include text messages to and from users within the network. For each text message, we have information about the sender, the recipient, and the timestamp of the message. We also have the length, in characters of the message -- the full text is not available for privacy reasons. The dataset consists text message records between 4,545,744 distinct phone numbers. At least one party of each message is within the network of the mobile carrier. That is, the data contain customers outside the network. Due to privacy and confidentiality concerns, these data are not publicly available; we have access to these data through iLab, Heinz school.

Related Work

  • On this data:
    • Nanavati et al. (2006) -- Some exploratory data analysis (power laws, degree distributions...) ; Graph structure for the phone call networks
    • Seshadri et al. (2008) --Other models besides power laws for the phone call distributions
    • De Melo et al. (2010) -- Looked at the duration of phone calls; Group behavior models
  • On reciprocity:
    • Zhang, Dantu, and Cangussu (2009) -- some reciprocity measures that include time (using phone call data as well)

Note, these papers to be filled in for the next phase.

Proposed Work

Reciprocity

Reciprocity is a property of a communication network which measures how balanced relationships are in the network. In communication terms, it simply means: if I talk to you, do you talk to me as much? Reciprocity can characterize the nature of relationships and the overall flavor of the network. For example, a rigidly hierarchical network derived from a corporation may display less reciprocity than a network derived from a group of high school students.

Previous studies have focused on reciprocity in phone calls. However, there is a fundamental problem with applying reciprocity measures to these sort of data. When a person calls another, that in itself may be a reciprocal relationship, since both people talk during the call. A call back from the second person may not be necessary to complete the communication. All we can hope to measure with such data is if a pair of people alternate in initiating communication. We cannot truly measure the depth of their communication. Text messages, on the other hand, do not have such an issue; a reply text message is necessary for a reciprocal relationship.

In a phone call network, we may wish to measure the reciprocity of the overall network, or of the individual pairs (dyads). The simplest approach to the first issue is to use simple dyad based counts to measure the overall network reciprocity. Alternately, we could measure the reciprocity of individual dyads using a ratio of counts of calls.

Abnormal Behavior