Difference between revisions of "Popular Event Tracking"
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | Popular Event Tracking is a [[Category::method]] to take both interest and network structure into account. Meanwhile, it creates independence based on several import observations and assumptions. | ||
+ | |||
== Definitions == | == Definitions == | ||
Line 10: | Line 12: | ||
5. ''Interest:'' for each event, at each time point, each user has a certain level of interest in the event which is expressed as <math>h_k(i)</math>. | 5. ''Interest:'' for each event, at each time point, each user has a certain level of interest in the event which is expressed as <math>h_k(i)</math>. | ||
− | |||
== Observations == | == Observations == | ||
Line 22: | Line 23: | ||
3. Content & Interest: if user i has a higher level of interest in an event, the content he generates should be more likely to be related to the event. | 3. Content & Interest: if user i has a higher level of interest in an event, the content he generates should be more likely to be related to the event. | ||
− | + | == Assumptions == | |
− | |||
− | == | ||
First, the authors introduce two reasonable independent assumptions: | First, the authors introduce two reasonable independent assumptions: | ||
Line 31: | Line 30: | ||
2. Given the current interest status and the document, the current topic model is independent of the network structure and previous interest status. The reason is that once the user has a certain level of interest toward some event, the contents he produces will only depends on the event and the interest level. | 2. Given the current interest status and the document, the current topic model is independent of the network structure and previous interest status. The reason is that once the user has a certain level of interest toward some event, the contents he produces will only depends on the event and the interest level. | ||
+ | |||
+ | == Details about the Model == | ||
Based on the above assumptions, the inference target becomes: <math>P(H_k,\theta_k|G_k,D_k,H_{k-1})= P(H_k|G_k,H_{k-1})P(\theta_k|H_k,D_k)</math> | Based on the above assumptions, the inference target becomes: <math>P(H_k,\theta_k|G_k,D_k,H_{k-1})= P(H_k|G_k,H_{k-1})P(\theta_k|H_k,D_k)</math> | ||
The first part <math>P(H_k|G_k,H_{k-1})</math> is corresponding to the first assumption and is called the '''interest model''' because it measures the distribution of interest. The second part <math>P(\theta_k|H_k,D_k)</math> is related to the second assumption and is called '''topic model''' which deals with topic distribution. | The first part <math>P(H_k|G_k,H_{k-1})</math> is corresponding to the first assumption and is called the '''interest model''' because it measures the distribution of interest. The second part <math>P(\theta_k|H_k,D_k)</math> is related to the second assumption and is called '''topic model''' which deals with topic distribution. | ||
− | The authors use Gibbs Random Field as the interest model. In Gibbs Random Field, the authors define the energy function <math>U(H_k)=\sum^N_{i=1}V_i(h_k(i))+\sum^N_{i=1}V'_i(h_k(i),h_k(-i))</math>, where -i refers to all vertices except i. | + | The authors use [[Gibbs Random Field]] as the interest model. In Gibbs Random Field, the authors define the energy function <math>U(H_k)=\sum^N_{i=1}V_i(h_k(i))+\sum^N_{i=1}V'_i(h_k(i),h_k(-i))</math>, where -i refers to all vertices except i. |
<math>V_i(h_k(i))=(h_k(i)-h_{k-1}(i))^2</math>. This is regarded as transition energy. The authors would like to minimize this cost so that the interest value will become generally consistent. This definition is corresponding to observation 2 mentioned above. | <math>V_i(h_k(i))=(h_k(i)-h_{k-1}(i))^2</math>. This is regarded as transition energy. The authors would like to minimize this cost so that the interest value will become generally consistent. This definition is corresponding to observation 2 mentioned above. |
Latest revision as of 14:15, 4 February 2011
Popular Event Tracking is a method to take both interest and network structure into account. Meanwhile, it creates independence based on several import observations and assumptions.
Definitions
1. Network Stream: is a stream of network structures. Each element in the set is a snapshot of the network at time . .
2. Document stream: is a stream of document collections. is a the set of documents published between time and . . is the text document associated with user i in time .
3. Topic: Semantically coherent topic is a multinomial distribution of words .
4. Event: is a stream of topics. Among these, is either specified by users or be discovered by an event detection algorithm.
5. Interest: for each event, at each time point, each user has a certain level of interest in the event which is expressed as .
Observations
The model in this paper relies on the following three important observations:
1. Interest & Connection: user i's current interest is influenced by i's connections and a stronger tie brings a larger impact.
2. Interest & History: interest values are generally consistent over time.
3. Content & Interest: if user i has a higher level of interest in an event, the content he generates should be more likely to be related to the event.
Assumptions
First, the authors introduce two reasonable independent assumptions:
1. Given the current network structure and the previous interest status, the current interest status is independent of the document collection. This is a cause-effect assumption that people generate the document at this moment as a result of current interest rather than a cause of current interest.
2. Given the current interest status and the document, the current topic model is independent of the network structure and previous interest status. The reason is that once the user has a certain level of interest toward some event, the contents he produces will only depends on the event and the interest level.
Details about the Model
Based on the above assumptions, the inference target becomes: The first part is corresponding to the first assumption and is called the interest model because it measures the distribution of interest. The second part is related to the second assumption and is called topic model which deals with topic distribution.
The authors use Gibbs Random Field as the interest model. In Gibbs Random Field, the authors define the energy function , where -i refers to all vertices except i.
. This is regarded as transition energy. The authors would like to minimize this cost so that the interest value will become generally consistent. This definition is corresponding to observation 2 mentioned above.
, where is the expectation of estimated from user i's neighbors. This is exact observation 1.
Then based on observation 3, the authors define the topic model which means that a higher interest of user i toward certain event will lead a higher proportion of his text belonging to that event.