# SurvivalAnalysis

## Contents

## Brief Summary

Survival analysis refers to a collection of statistical methods for modeling a lifetime against some given event. When studying survival in a biological sense, this event is typically death. In engineering, the event may be the failure of a mechanical system. In the social sciences, events can include varied phenomena such as divorce, leaving a job, graduation, etc. Survival analysis is used to answer questions related to expected lifespan, proportion of subjects surviving past a given time, rate of death or failure, and factors contributing to longevity, among others.

## Survival Function

The survival function represents the probability that death occurs later than a given time *t*. Generally, *S(0)* is assumed to be 1, unless there is a possibility of death occurring instantly. Additionally, as time passes *S(t)* approaches 0, except in instances where there is a possibility death will not occur at all. It seems likely that *S(t)* will approach 0 over time in most applications of survival analysis to social media, given that even if a person is actively participating in a social media environment until their own physical death, their participation will necessarily end at that point.

## Censoring

In many cases, it is not always possible to observe individuals until the event of interest happens. For example, if we are studying user participation patterns on a social networking service over a period of two years, it is entirely possible (and perhaps even likely) that users who were active at the end of the two year period will stop participating on the service at some point in the future. We have no way of knowing if and when this will occur; all that we know is that these people were still participating at the end of the study. For these people, their survival time is considered *right censored*. Data can also be *left censored* if we know the event of interest happened but do not know exactly when. As an example, if we are studying when people become infected with the flu virus and define time of "death" as the time when they first test positive for the flu, we don't know when they were actually first exposed to the flu virus and thus the exact time of infection. Censoring should be taken into account when calculating expected survival time.

## Additional Information

A starting point for additional information about functions and distributions used in survival analysis is the Wikipedia survival analysis article.

## Examples in Social Media Analysis

Yang et al. used survival analysis in their 2010 paper to assess user participation patterns in online Q&A communities.