Influentials, Networks, and Public Opinion Formation

From Cohen Courses
Revision as of 20:20, 5 November 2012 by Lujiang (talk | contribs) (Created page with '== Citation == Watts, Duncan J., and Peter Sheridan Dodds. "Influentials, networks, and public opinion formation." Journal of consumer research 34.4 (2007): 441-458. == Online …')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Watts, Duncan J., and Peter Sheridan Dodds. "Influentials, networks, and public opinion formation." Journal of consumer research 34.4 (2007): 441-458.

Online version

[1]

Problem

Idea

Method

Some words alternation in a phrase during the quotation (called textual mutation) could inhibit the accurate tracking. To solve this problem, the authors propose a robust method to cluster textual variants of quotes consisting of two stages namely phrase graph construction and clustering.

Graph construction

Each node in the phrase graph represents a phrase extracted from the corpus. An edge is included for every pair of phrases p and q, which always points from shorter phrases to longer phrases. Two phrases are connected either the edit-distance (treating a word as a token) is smaller than 1 or there is at least a 10-word consecutive overlap between them. In other words, the edge implies the inclusion relation between the phrases and since the direction is strictly pointing to longer phrases the graph becomes a directed acyclic graph (DAG).

The authors fail to elaborate how the weight on each edge is calculated. They only state that the weight is increased as the directed edit distance as well as the frequency of q grows.


Data set

90 million news and blog articles 390GB collected over the final three months of the 2008 U.S. Presidential Election (from August 1 to October 31 2008).

Result

Based on the 35,800 non-trivial clusters (at least two phrases), the author extracted 50 largest threads which can be regarded as the cluster of the cluster containing some phrases and the threads are depicted in the following famous figure.

none
Fig.2 Tracking 50 largest threads

From the above figure we can not only obtain a clue about the news cycle but also get an idea about the popular news in each period. In addition, the authors also conclude their findings by global analysis and local analysis.


Notes

[2] Support website

[3] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst. Cascading behavior in large blog graphs.SDM’07.

[4] X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends.Proc. KDD, 2006.

[5] X. Wang, C. Zhai, X. Hu, R. Sproat. Mining correlated bursty topic patterns from coordinated text streams.KDD, 2007.