Project Draft 1 - diliu, dperciva

From Cohen Courses
Jump to navigationJump to search

Project Team

Di Liu

Daniel Percival

Introduction

We propose to explore two aspects of a large collection of SMS (text message) data. First, we will explore methods to measure the reciprocity of the ties, taking into account both time and perhaps length of the text message. Second, we will focus on abnormal behavior detection. These data consist of both phone call and SMS records have been used previously in several studies, mostly focused on the phone records.

Dataset

The dataset was collected by an anonymous mobile phone operator in the 6 month period between December 1, 2007 and May 31, 2008. The data include text messages to and from users within the network. For each text message, we have information about the sender, the recipient, and the timestamp of the message. We also have the length, in characters of the message -- the full text is not available for privacy reasons. The dataset consists text message records between 4,545,744 distinct phone numbers. At least one party of each message is within the network of the mobile carrier. That is, the data contain customers outside the network. Due to privacy and confidentiality concerns, these data are not publicly available; we have access to these data through iLab, Heinz school.

Related Work

  • On this data:
    • Nanavati et al. (2006) -- Some exploratory data analysis (power laws, degree distributions...) ; Graph structure for the phone call networks
    • Seshadri et al. (2008) --Other models besides power laws for the phone call distributions
    • De Melo et al. (2010) -- Looked at the duration of phone calls; Group behavior models
  • On reciprocity:
    • Zhang, Dantu, and Cangussu (2009) -- some reciprocity measures that include time (using phone call data as well)

Note, these papers to be filled in for the next phase.

Proposed Work

subsection