Yang et al., ICWSM 2010

From Cohen Courses
Revision as of 10:42, 3 September 2010 by WikiAdmin (talk | contribs) (1 revision)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Citation

Yang, J., Wei, X., Ackerman, M. and Adamic, L. 2010. "Activity Lifespan: An Analysis of User Survival Patterns in Online Knowledge Sharing Communities." In Proceedings of the 4th International AAAI Conference on Weblogs and Social Media.

Online version

ICWSM 2010 (Note: As of April 2010, this paper is not available online. However, the paper should be available at this page after ICWSM in May 2010.)

Summary

This paper applies survival analysis, a method borrowed from statistics, to the problem of analyzing and predicting user participation patterns in online knowledge sharing communities, specifically question and answer (Q&A) sites. While Q&A sites have been experiencing rapid growth over the past few years, not much is known about what sustains user participation in these communities in the long term. Previous work on similar communities and Usenet newsgroups has only focused on the first and second posts a user makes. In this paper, Yang et al. examine user participation over a period of two years, hoping to gain a more detailed assessment of the factors associated with user participation over an extended period of time.

The analysis focused on data from three popular question and answer communities: Yahoo! Answers (USA), Baidu Knows (China), and Naver Knowledge-iN (Korea). These communities are all similar in terms of number of users, number of questions/answers, and site mechanics (e.g. point accrual based on answers given). For all three sites, the data included a complete first year history and either a complete second year history (Baidu and Naver) or a complete history of questions posted in the second year by a random sample of 150k users (Yahoo!; this was due to limitations of the Yahoo! Answers API).

In terms of the survival analysis, "lifespan" is defined with respect to participation. Since it's impossible to tell whether a user has in fact ceased participation entirely (as opposed to just not posting for a while and coming back months or years later), the authors defined "death" as "a period of inactivity exceeding 100 days."

Findings

  • Many users (between 30-70%, depending on the site) leave after only one post. However, the longer a user stays active, the more likely they are to continue to stay active in the future. In terms of asking versus answering questions, answering behavior persists longer than asking behavior.
  • The first post a user makes is likely to be a question. Between 2/3 and 3/4 of all first posts were questions, with the only exception occurring during Naver's first year - then, only 35% of all first posts were questions. However, during the second year, Naver's first post statistics were comparable to the other two sites.
  • The way a first post is received seems to have an affect on how likely a user is to stay. With regard to a first question, user longevity is positively correlated with receiving a larger quantity of replies (both YA&BK), as well as choosing a best answer and receiving longer replies (BK only). For users who started by answering questions, having an answer selected as "best answer" is positively correlated with longevity (YA&BK), but these results have very limited prediction power (very small R2).
  • Looking at participation patterns over a period of time, as opposed to just the first post, yields more predictive power (as might be expected). For instance, when looking at asking patterns over the first 30 days of participation, the authors found a positive association between the number of questions asked and the longevity of the user (on NK, this represents most of the longevity prediction). Average question length is also positively correlated with longevity, as are larger numbers of replies. Answering patterns over the first 30 days also provide predictive data. Both the number of questions answered and the number of times an answer was marked "best" are positively correlated with longevity, though the latter is only a weak correlation. Additionally, answer length is weakly correlated with longevity.
  • All three sites experienced a decline in survival rate from the first year to the second year. It's not clear whether this is due to a difference between early adopters and those who join later or whether new users tend to become less committed to the site after their first year.
  • Survival patterns differ between question categories, specifically in terms of the distinction between "conversational" and "informational" questions. "Conversational" categories, such as "entertainment," have higher sustained survival rates than do "informational" categories (e.g. "medicine," "games"). One notable exception to this trend was found on BK, where the "computer/internet" category was found to have a significantly higher survival rate than all the other categories. The authors conducted human coding of a small sample of questions on both BK and YA in the "entertainment" and "computer/internet" categories, and found that more social conversations occurred on YA in general, whereas the conversations on BK were more narrow and to the point. They interpret this finding as a "consequence of the complicated interactions among (1) information needs and (2) cultural differences," speculating that BK may be one of the only sources of information about computers and online resources available to internet users in China and that the Western users of YA may be "more willing to express their opinions and feelings." However, this may not be the best explanation. Site norms, for instance, may encourage a certain kind of interaction among users independent of regional culture.

Related papers

Lampe & Johnston (2005) looked at new user behavior on Slashdot, using a combination of comment data and survey responses to explore the mechanisms that affect new user contributions.

Joyce & Kraut (2006) examined USENET postings by new users, exploring factors correlated with both the likelihood of receiving a reply as well as the likelihood of making a second post.

Harper, Moy & Konstan (2009) defined and attempted to distinguish between conversational and informational questions.