Comparative Study of CQA : Anderson et al and Liu et al

From Cohen Courses
Jump to navigationJump to search

Papers

  1. Discovering Value from Community Activity on Focused Question Answering Sites: A Case Study of Stack Overflow(Anderson et al)
  2. Predicting web searcher satisfaction with existing community-based answers(Liu et al)

Central Idea,Problems Addressed

Anderson et al extensively presented the community dynamics of Community-based Question Answering (CQA) sites. Further, they have solved two problems in the context of CQA. The first problem deals with the prediction of long-term value to a question. The second problem is to predict whether a question has been sufficiently answered or not.

Liu et al. have proposed a solution to a novel problem of predicting the satisfaction of an external web searcher through the answers archived in the CQA site.

The two papers are trying to address different problems. Anderson et al is studying the dynamics of a CQA, and trying to predict the questions within a CQA. Liu et al tries to measure the importance of the answers of a CQA relevant to users outside the CQA. Further, it can be said that Liu et al inspires from the works of Anderson et al. Since a lot of work has been done within a CQA, it tries the measure the importance to an outside world of web searchers.

Methodology

Both the papers have build a Logistic regression model using an extensive list of features.

Anderson et al have proposed the solutions of two problems - Predicting the long term value of a question and Predicting whether a question has been sufficiently answered. There are features which are common to both the problems, as well as features specific to the problem.

Liu et al have divided the task into three subtasks - query clarity, query-answer match and answer satisfaction. It has defined features pertaining to each task. Further, they have defined two forms of Logisitc Regression. - Direct and Combined. In Direct Logistic Regression they combine all the features of all the three subtasks and come up with one model. In Composite Logistic Regression the model for each subtask is trained separately and then combined to come up with one model.

Dataset

Anderson et al have used Stack Overflow Data for the study. Liu et al have used Click Dataset on Google search leading to Yahoo! Answers.

Evaluation

The evaluation metric of both the papers are different. Where Anderson et al measures the accuracy and Area under the ROC curve ( AUC), Liu et al uses Correlation and RMSE as the evaluation metric. Anderson et al have stated that their metric is close to the ground truth, Liu et al states that they their methodology gives solves a novel problem with a high correlation with the anwers of the human judges.

Other Questions

  1. How much time did you spend reading the (new, non-wikified) paper you summarized? 2.5 hours.
  2. How much time did you spend reading the old wikified paper? 45 mins.
  3. How much time did you spend reading the summary of the old paper? 1 hour.
  4. How much time did you spend reading background material? Since the problem is very close to my project problem, I have spent a lot of time reading material about the CQA in general.
  5. Was there a study plan for the old paper? Yes
    1. if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them? Yes I read the terms mentioned in the study plan. It took me 30 mins.
  6. Give us any additional feedback you might have about this assignment.

It was a different exercise to first write the own summary for the 2nd paper and then, read the summary of the first paper, and then go through the original first paper to fill the missing points. The original paper being a case study of Stack Overflow was quite long and involved quite a lot of details. The summary provided me a fairly good jist of the content of the paper.