Difference between revisions of "L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006"
(Created page with '== Citation == Lun-Wei Ku, Yu-Ting Liang, Hsin-Hsi Chen: Opinion Extraction, Summarization and Tracking in News and Blog Corpora. AAAI Spring Symposium: Computational Approaches …') |
(→Data) |
||
Line 34: | Line 34: | ||
=== Data === | === Data === | ||
− | + | * TREC 2003 (Soboroff and Harman, 2003). 50 document sets of 2003 TREC novelty corpus, and each set | |
+ | contains 25 documents. All documents in the same set are | ||
+ | relevant. | ||
+ | * NTCIR(Chen and Chen, 2001). The test collection | ||
+ | consists of 50 topics and 6 of them are opinionated topics. | ||
+ | Total 192 documents relevant to the six topics are chosen | ||
+ | as training data in this paper. Documents of an additional | ||
+ | topic “animal cloning” of NTCIR 3 are selected from | ||
+ | CIRB011 and CIRB020 document collections and used for | ||
+ | testing. | ||
+ | *Blog is a new rising community for expressing opinions. | ||
+ | To investigate the opinions expressed in blogs, we retrieve | ||
+ | documents from blog portals by the query “animal cloning”. There are 20 documents in total. | ||
=== Task === | === Task === | ||
Opinion Extraction, Opinion Summarization, Opinion Tracking. Details will explained in the following sub-section | Opinion Extraction, Opinion Summarization, Opinion Tracking. Details will explained in the following sub-section |
Revision as of 23:21, 25 October 2012
Citation
Lun-Wei Ku, Yu-Ting Liang, Hsin-Hsi Chen: Opinion Extraction, Summarization and Tracking in News and Blog Corpora. AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs 2006: 100-107
Online Version
Summary
Abstract
Humans like to express their opinions and are eager to know others’ opinions. Automatically mining and organizing opinions from heterogeneous information sources are very useful for individuals, organizations and even governments. Opinion extraction, opinion summarization and opinion tracking are three important techniques for understanding opinions. Opinion extraction mines opinions at word, sentence and document levels from articles. Opinion summarization summarizes opinions of articles by telling sentiment polarities, degree and the correlated events. In this paper, both news and web blog articles are investigated. TREC, NTCIR and articles collected from web blogs serve as the information sources for opinion extraction. Documents related to the issue of animal cloning are selected as the experimental materials. Algorithms for opinion extraction at word, sentence and document level are proposed. The issue of relevant sentence selection is discussed, and then topical and opinionated information are summarized. Opinion summarizations are visualized by representative sentences. Text-based summaries in different languages, and from different sources, are compared. Finally, an opinionated curve showing supportive and nonsupportive degree along the timeline is illustrated by an opinion tracking system.
Data
- TREC 2003 (Soboroff and Harman, 2003). 50 document sets of 2003 TREC novelty corpus, and each set
contains 25 documents. All documents in the same set are relevant.
- NTCIR(Chen and Chen, 2001). The test collection
consists of 50 topics and 6 of them are opinionated topics. Total 192 documents relevant to the six topics are chosen as training data in this paper. Documents of an additional topic “animal cloning” of NTCIR 3 are selected from CIRB011 and CIRB020 document collections and used for testing.
- Blog is a new rising community for expressing opinions.
To investigate the opinions expressed in blogs, we retrieve documents from blog portals by the query “animal cloning”. There are 20 documents in total.
Task
Opinion Extraction, Opinion Summarization, Opinion Tracking. Details will explained in the following sub-section