Macskassy: Contextual linking behavior of bloggers

From Cohen Courses
Jump to navigationJump to search

Citation

S. Macskassy. 2011. Contextual linking behavior of bloggers: leveraging text mining to enable topic-based analysis. Soc. Netw. Anal. Min. 1:355-375.


Online Version

[1]


Summary

Following up on previous work on topic discovery in blogs, Macskassy demonstrates a method to tag links and form topic-specific graphs, which can be used to do a temporal analysis on how the components form as well as show blogger behavior after the links form. These behaviors are not easily visible when examining the general blog graph over all blogs. Furthermore, these topic-specific graphs help to locate centrally influential blogs on certain topics; said influence would otherwise not be visible within the larger graph.


Background

Substantial prior research has examined the analysis of the blogosphere and social media, including structure, demographics, and evolution of the blogosphere. The flow of information through the blogosphere is an important topic that has received increased scrutiny, utilizing information cascade theory.


Data Used

Macskassy uses a dataset of blog posts consisting of >1.13 million blog posts gathered between 5/23/2009 and 6/12/2009, a period of three weeks. The bloggers selected for monitoring for blog posts were a random sample of bloggers monitored commercially, from sites such as LiveJournal, WordPress, Blogger, etc.


Methodology

LDA[2] was used to identify relevant topics within the data set. Of the ~1000 topics discovered, most were found to be legitimate, and could be categorized under larger topic headings (music, religion, finance, etc.)

Hyperlinks connecting blogs were tagged based on the topics extracted from the containing blog posts. Each of these topics can then be used to create a topically-relevant network. The resulting topic-graphs were compared to several alternative methods of creating smaller networks and determined that the topic-graphs extracted a different network topology than the alternate methods.

Macskassy then examined the temporal behavior of bloggers after the formation of these topic-graphs: namely, did bloggers deviate from ‘local posting’ (ie, posting within the same graph component)? The author found that in this fashion, the topic-graphs behaved differently from the overall graph: although the main graph showed similar levels of linking across component sizes (both to and from larger components to smaller components), topic graphs showed much less linking from large components to small components.


Related Papers

D. Blei, A. Ng, and M. Jordan. 2003. Latent dirichlet allocation. Journal of machine Learning Research 3: 993–1022. [3]

R. Ghosh, K. Lerman. 2011. A framework for quantitative analysis of cascades on networks. In: Proceedings of web search and data mining conference (WSDM).

R. Nallapati and W. Cohen. 2008. Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs. In International Conference for Weblogs and Social Media. [4]