Lin et al KDD 2011

From Cohen Courses
Jump to navigationJump to search

PET:A Statistical Model for Popular Events Tracking in Social Communities

Problem: In this paper, the authors address a method to observe and track the popular events or topics that evolve over time in the communities. Existing methods separate topics and netword structures apart. In this paper, textual topics and network are combined together which makes more sense.

Method: The authors address the event tracking by first defining a term - Popular Event Tracking (PET) in online communities which includes the popularity of events over time, the burstiness of user interest, information diffusion through the network structure and the evolution of topics.

PET leverages a Gibbs Random Field to model the interest of users, depending on their historical status as well as the influence form their social connections. The intuition here is that my current interest will be strongly related with my previous interest. Also my interest will be influenced by my friends which are my connections in social media.

Definitions:

1. Network Stream: is a stream of network structures. Each element in the set is a snapshot of the network at time . .

2. Document stream: is a stream of document collections. is a the set of documents published between time and . . is the text document associated with user i in time .

3. Topic: Semantically coherent topic is a multinomial distribution of words .

4. Event: is a stream of topics. Among these, is either specified by users or be discovered by an event detection algorithm.

5. Interest: for each event, at each time point, each user has a certain level of interest in the event which is expressed as .

Event Tracking Model:

The model in this paper relies on the following three important observations: 1. Interest & Connection: user i's current interest is influenced by i's connections and a stronger tie brings a larger impact. 2. Interest & History: interest values are generally consistent over time. 3. Content & Interest: if user i has a higher level of interest in an event, the content he generates should be more likely to be related to the event.

General Model: First, the authors introduce two reasonable independent assumptions: 1. Given the current network structure and the previous interest status, the current interest status is independent of the document collection. This is a cause-effect assumption that people generate the document at this moment as a result of current interest rather than a cause of current interest.

2. Given the current interest status and the document, the current topic model is independent of the network structure and previous interest status. The reason is that once the user has a certain level of interest toward some event, the contents he produces will only depends on the event and the interest level.

Based on the above assumptions, the inference target becomes: The first part is corresponding to the first assumption and is called the interest model because it measures the distribution of interest. The second part is related to the second assumption and is called topic model which deals with topic distribution.

Dataset:

In this paper, the authors select twitter as their source of data. They choose 5,000 users with follower=followee relationships and crawling down 1,438,826 tweets displayed by these users during the period from Oct.2009 to early Jan.2010. Each day is regarded as a time point. Document is obtained by simply concatenating all tweets displayed by the user in certain day. The connection is defined as the number of tweets displayed by user by following another user during the period of 30 days.

Comparison with baseline models:

Here in this paper, the authors compare PET with some other baseline models such as JonK, Cont, BOM and GInt. The authors apply these models to analyze the Popularity Trend. The conclusion is that PET generates the most consistent trends to the gold standard because PET estimates the popularity by comprehensively considering historic, textual and structured information in a unified way.

The popularity trend is shown in Fig.1. Network diffusion is shown in Fig.2. Result.jpg Result2.jpg