http://curtis.ml.cmu.edu/w/courses/api.php?action=feedcontributions&user=Ysim&feedformat=atomCohen Courses - User contributions [en]2024-03-29T09:31:01ZUser contributionsMediaWiki 1.33.1http://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15218Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:49:10Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the "social" aspect, where an event is being talked about more than usual by "social actors".<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].<br />
<br />
== Questions ==<br />
# How much time did you spend reading the (new, non-wikified) paper you summarized? ''About 35 minutes.''<br />
# How much time did you spend reading the old wikified paper? ''About 35 minutes.''<br />
# How much time did you spend reading the summary of the old paper? ''About 15 minutes.''<br />
# How much time did you spend reading background materiel? ''About 30 minutes.''<br />
# Was there a study plan for the old paper? ''There wasn't an explicit study plan, but the article did provide a good background of the related papers that would be useful.''<br />
## if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them? ''Yes. I did a quick read of the [[Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]] paper. It took me about 10 minutes.''<br />
# Give us any additional feedback you might have about this assignment. ''The paper pairings was well chosen (at least for the papers I read). Doing a comparative analysis of two papers enable me to think more deeply about the different approaches to the same/similar problem and identify the pros/cons/assumptions of each, etc.''</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15216Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:48:07Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the "social" aspect, where an event is being talked about more than usual by "social actors".<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].<br />
<br />
== Questions ==<br />
# How much time did you spend reading the (new, non-wikified) paper you summarized?<br />
About 35 minutes.<br />
# How much time did you spend reading the old wikified paper?<br />
About 35 minutes.<br />
# How much time did you spend reading the summary of the old paper?<br />
About 15 minutes.<br />
# How much time did you spend reading background materiel?<br />
About 30 minutes.<br />
# Was there a study plan for the old paper?<br />
There wasn't an explicit study plan, but the article did provide a good background of the related papers that would be useful.<br />
## if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?<br />
Yes. I did a quick read of the [[Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]] paper. It took me about 10 minutes.<br />
# Give us any additional feedback you might have about this assignment.<br />
The paper pairings was well chosen (at least for the papers I read). Doing a comparative analysis of two papers enable me to think more deeply about the different approaches to the same/similar problem and identify the pros/cons/assumptions of each, etc.</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15215Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:47:48Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the "social" aspect, where an event is being talked about more than usual by "social actors".<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].<br />
<br />
== Questions ==<br />
# How much time did you spend reading the (new, non-wikified) paper you summarized?<br />
About 35 minutes.<br />
# How much time did you spend reading the old wikified paper?<br />
About 35 minutes.<br />
# How much time did you spend reading the summary of the old paper?<br />
About 15 minutes.<br />
# How much time did you spend reading background materiel?<br />
About 30 minutes.<br />
# Was there a study plan for the old paper?<br />
There wasn't an explicit study plan, but the article did provide a good background of the related papers that would be useful.<br />
## if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?<br />
Yes. I did a quick read of the [[Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]] paper. It took me about 10 minutes.<br />
# Give us any additional feedback you might have about this assignment.<br />
The paper pairings was well chosen (at least for the papers I read). Doing a comparative analysis of two papers enable me to think more deeply about the different approaches to the same/similar problem and identify the pros/cons/assumptions of each, etc.</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15213Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:47:28Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the "social" aspect, where an event is being talked about more than usual by "social actors".<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].<br />
<br />
== Questions ==<br />
# How much time did you spend reading the (new, non-wikified) paper you summarized?<br />
About 35 minutes.<br />
<br />
# How much time did you spend reading the old wikified paper?<br />
About 35 minutes.<br />
<br />
# How much time did you spend reading the summary of the old paper?<br />
About 15 minutes.<br />
<br />
# How much time did you spend reading background materiel?<br />
About 30 minutes.<br />
<br />
# Was there a study plan for the old paper?<br />
There wasn't an explicit study plan, but the article did provide a good background of the related papers that would be useful.<br />
<br />
## if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?<br />
Yes. I did q quick read of the [[Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]] paper. It took me about 10 minutes.<br />
<br />
# Give us any additional feedback you might have about this assignment.<br />
The paper pairings was well chosen (at least for the papers I read). Doing a comparative analysis of two papers enable me to think more deeply about the different approaches to the same/similar problem and identify the pros/cons/assumptions of each, etc.</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15209Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:43:24Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the "social" aspect, where an event is being talked about more than usual by "social actors".<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].<br />
<br />
== Questions ==<br />
# How much time did you spend reading the (new, non-wikified) paper you summarized?<br />
# How much time did you spend reading the old wikified paper?<br />
# How much time did you spend reading the summary of the old paper?<br />
# How much time did you spend reading background materiel?<br />
# Was there a study plan for the old paper?<br />
## if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?<br />
## Give us any additional feedback you might have about this assignment.</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15208Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:43:08Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the "social" aspect, where an event is being talked about more than usual by "social actors".<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].<br />
<br />
== Questions ==<br />
# How much time did you spend reading the (new, non-wikified) paper you summarized?<br />
# How much time did you spend reading the old wikified paper?<br />
# How much time did you spend reading the summary of the old paper?<br />
# How much time did you spend reading background materiel?<br />
# Was there a study plan for the old paper?<br />
# if so, did you read any of the items suggested by the study plan? and how much time did you spend with reading them?<br />
# Give us any additional feedback you might have about this assignment.</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Q._Zhao,_P._Mitra,_and_B._Chen._Temporal_and_information_flow_based_event_detection_from_social_text_streams._In_AAAI,_2007&diff=15206Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 20072012-11-06T03:41:34Z<p>Ysim: </p>
<hr />
<div>This [[Category::paper]] is relevant to [[AddressesProblem::Controversial_events_detection|detecting controversial events]] and [[AddressesProblem::Event detection]].<br />
<br />
= Temporal and information flow based event detection from social text streams=<br />
<br />
== Citation ==<br />
<br />
Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.<br />
<br />
== Online version ==<br />
<br />
[http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf Temporal and information flow based event detection from social text streams]<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
* [[RelatedPaper::Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic Relationship and Event Discovery]] This papers aims to detect events through discovering "co-bursting" entities.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Q._Zhao,_P._Mitra,_and_B._Chen._Temporal_and_information_flow_based_event_detection_from_social_text_streams._In_AAAI,_2007&diff=15205Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 20072012-11-06T03:41:22Z<p>Ysim: </p>
<hr />
<div>This [[Category::paper]] is relevant to [[AddressesProblem::Controversial_events_detection|detecting controversial events]] and [[AddressesProblem::Event detection]].<br />
<br />
= Temporal and information flow based event detection from social text streams=<br />
<br />
== Citation ==<br />
<br />
Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.<br />
<br />
== Online version ==<br />
<br />
[http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf Temporal and information flow based event detection from social text streams]<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
* [[RelatedPaper::Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic Relationship and Event Discovery]] This papers aims to detect events through discovering ``co-bursting'' entities.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15203Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:40:27Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the ``social'' aspect, where an event is being talked about more than usual by ``social actors''.<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15202Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:40:12Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the ``social'' aspect, where an event is being talked about more than usual by ``social actors''.<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]]<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]]<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]]<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]]<br />
* [[RelatedPaper::Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]]<br />
* [[RelatedPaper::Pereira et.al. Distributional Clustering Of English Words, ACL 1993|distributional clustering]]<br />
* [[RelatedPaper::Banko_2007_Open_Information_Extraction_from_the_Web]]<br />
* [[RelatedPaper::Chambers, N. and Jurafsky, D. Template-based information extraction without the templates, ACL 2011]].</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=ToWikify&diff=15199ToWikify2012-11-06T03:36:04Z<p>Ysim: </p>
<hr />
<div>{| class="wikitable sortable" border="1" cellpadding="4" cellspacing="0"<br />
|-<br />
! Paper !! Related Paper !! Student Andrew ID !! Link to Comparison<br />
|-<br />
| [[Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks]] || [[Inferring the Diffusion and Evolution of Topics in Social Communities]] [http://www.cs.uiuc.edu/homes/hanj/pdf/snakdd11_clin.pdf] || Bliu1 || [[Compare_Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks_and_Inferring_the_Diffusion_and_Evolution_of_Topics_in_Social_Communities]]<br />
|-<br />
| [[Zheleva_ACM_2009]] || [[Geographic routing in social networks]] [http://www.pnas.org/content/102/33/11623] || ||<br />
|-<br />
| [[Y._Borghol_et_al._Performance_Evaluation_68_2011]] || [[The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]] [http://www.ida.liu.se/~nikca/papers/kdd12.pdf] || tinghuiz || [[Compare Y. Borghol et al. 2011 and The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]]<br />
|-<br />
| [[Zheleva_and_Getoor,_WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || zsheikh ||<br />
|-<br />
| [[Vladimir_Ouzienko,_Prediction_of_Attributes_and_Links_in_Temporal_Social_Networks]] || [[Introduction to stochastic actor-based models for network dynamics]] [http://www.sciencedirect.com/science/article/pii/S0378873309000069] || ||<br />
|-<br />
| [[Miller_et_al_ICWSM_2011]] || [[Can predicate-argument structures be used for contextual opinion retrieval from blogs?]] [http://rd.springer.com/article/10.1007/s11280-012-0170-8] || ||<br />
|-<br />
| [[Ritter_et_al,_EMNLP_2011._Named_Entity_Recognition_in_Tweets:_An_Experimental_Study]] || [[Event discovery in social media feeds]] [http://people.csail.mit.edu/regina/my_papers/twitter_acl2011.pdf] || ||<br />
|-<br />
| [[Ritter_et_al_NAACL_2010._Unsupervised_Modeling_of_Twitter_Conversations]] || [[Catching the drift: Probabilistic content models, with applications to generation and summarization]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Fhlt-naacl2004%2Fmain%2Fpdf%2F167_Paper.pdf&ei=z_iHUMj6MavI0AHbm4HYDg&usg=AFQjCNFvkmshrGjFbst0izxL_4fR6chdiA&sig2=KBi5EDzmBxrd3sTzm5Qyhg] || ||<br />
|-<br />
| [[Modeling_Contagion_Through_Facebook_News_Feed]] || [[Cascading Behavior in Large Blog Graphs]] [http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf] ||thoang ||[[Compare Modeling_Contagion_Through_Facebook_News_Feed and Cascading Behavior in Large Blog Graphs]]<br />
|-<br />
| [[Yeh_et_al_WikiWalk_Random_walks_on_Wikipedia_for_Semantic_Relatedness]] || [[Personalizing PageRank for Word Sense Disambiguation]] [http://www.aclweb.org/anthology/E/E09/E09-1005.pdf] || nnori ||<br />
|-<br />
| [[Yano_et_al_NAACL_2009]] || [[Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs ]] [http://www-poleia.lip6.fr/~gallinar/Enseignement/2009-Papiers-ARI/icwsm2008-nalapati.pdf] || kwmurray ||<br />
|-<br />
| [[Ramage_et_al_ICWSM_2010]] || [[ Is it Really About Me? Message Content in Social Awareness Streams]] [http://dl.acm.org/citation.cfm?id=1718953] || yuchenz || [[Compare_Ramage_Naaman]]<br />
|-<br />
| [[Rodriguez_et_al_Oct_2011]] || [[ The origin of bursts and heavy tails in human dynamics]] [http://nd.edu/~networks/HumanDynamics_20Oct05/HumanDynamics_Nature207,435(2005).pdf] || dzheng ||<br />
|-<br />
| [[Measuring_User_Influence_in_Twitter:_The_Million_Follower_Fallacy]] || [[Influentials, Networks, and Public Opinion Formation]] [ftp://intranet.dei.polimi.it/outgoing/Carlo.Piccardi/VarieDsc/Wa07.pdf] || Lujiang ||<br />
|-<br />
| [[OConnor_et._al.,_ICWSM_2010]] || [[Widespread Worry and the Stock Market]] [http://social.cs.uiuc.edu/people/gilbert/pub/icwsm10.worry.gilbert.pdf] || Gmontane || [[Comparison: O'Connor et al. ICWSM 2010 & Widespread Worry and Stock Market]]<br />
|-<br />
| [[Link_propagation:_A_fast_semi-supervised_learning_algorithm_for_link_prediction]] || [[Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs]] [http://www.springerlink.com/content/g622186787k4258r/] || epapalex || [[Compare Link Propagation Papers]]<br />
|-<br />
| [[Mrinmaya_et._al._WWW%2712]] || [[The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email]] [http://people.cs.umass.edu/~mccallum/papers/art04tr.pdf] || Norii ||<br />
|-<br />
| [[Yano_et_al_ICWSM_2010._What’s_Worthy_of_Comment%3F_Content_and_Comment_Volume_in_Political_Blogs]] || [[Mixed membership models of scientific publication]] [http://www.cs.cmu.edu/~lafferty/pub/efl.pdf] || ymiao ||<br />
|-<br />
| [[Rosen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6843] || rgkulkar || http://malt.ml.cmu.edu/mw/index.php/Comparison_Rosen-Zvi_el_al_and_cohn_et_al<br />
|-<br />
| [[Reviewing_social_media_use_by_clinicians]] || [[Integrating the hospital library with patient care, teaching and research: model and Web 2.0 tools to create a social and collaborative community of clinical research in a hospital setting.]] [http://www.ncbi.nlm.nih.gov/pubmed/20712716?dopt=Abstract] || ||<br />
|-<br />
| [[Agarwal_et_al,_ICWSM_2009#Related_Works_and_Papers]] || [[Latent Friend Mining from Blog Data, ICDM 2006]] [http://dl.acm.org/citation.cfm?id=1193350] || zeyuz || [[Compare_latentfriend_familiarstranger]]<br />
|-<br />
| [[Hassan_et_al,_ICWSM_2009]] || [[Document representation and query expansion models for blog recommendation]] [http://www.cs.cmu.edu/~jaime/ArguelloICWSM08.pdf] || sushantk ||<br />
|-<br />
| [[E.A._Leicht,_Structure_of_Time_Evo_citation_networks_2007]] || [[Detecting Topic Evolution in Scientific Literature: How Can Citations Help?]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Fclgiles.ist.psu.edu%2Fpubs%2FCIKM2009-topic-evolution-citations.pdf&ei=YXCHUOnjDOXH0AHj74HoDQ&usg=AFQjCNHzvuuex1dNYKsuFzbxPIR3y45V5A&sig2=fP-6FsO1Pq4ewLToESR42g] || ziy ||<br />
|-<br />
| [[Birke%26Sarkar,FigLanguages07]] || [[A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006]] [http://acl.ldc.upenn.edu/E/E06/E06-1042.pdf] || tinghaoh ||<br />
|-<br />
| [[A_Discriminative_Latent_Variable_Model_for_SMT]] || [[An End-to-End Discriminative Approach to Machine Translation]] [http://www.seas.upenn.edu/~taskar/pubs/acl06.pdf] ||lingwang || [[Comparative Study of Discriminative Models in SMT]]<br />
|-<br />
| [[Davidov_et_al_COLING_10]] || [[Structured Models for Fine-to-Coarse Sentiment Analysis]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334] ||ydalal || [[Comparative Study : Sentiment Analysis using Automated pattern based appraoch VS Single structured model ]]<br />
|-<br />
| [[Anderson_et_al_KDD2012]] || [[Predicting web searcher satisfaction with existing community-based answers]] [http://www.cs.cmu.edu/~dpelleg/download/sigir311-liu.pdf] ||anikag ||<br />
|-<br />
| [[Leskovec_et_al.,_WWW_2010]] || [[Statistical properties of community structure in large social and information networks. In WWW ’08]] [http://cs-www.cs.yale.edu/homes/mmahoney/pubs/Communities_WWW.pdf] || zhua || [[Compare Leskovec et al. WWW 10 and Leskovec et al. WWW 08]]<br />
|-<br />
| [[Accurate_Unlexicalized_Parsing]] || [[Learning Accurate, Compact, and Interpretable Tree Annotation, S. Petrov, L. Barrett, R. Thibaux, D. Klein, ACL 2006]] [http://acl.ldc.upenn.edu/P/P06/P06-1055.pdf] || ||<br />
|-<br />
| [[Esuli_and_Sebastiani_LREC_2006]] || [[Determining term subjectivity and term orientation for opinion mining.]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_1_esulisebastiani_192.pdf&ei=v3KHUOfBMObs0gHT5IHIBg&usg=AFQjCNGk9-BW40FOkzPKLtVyb8a7Dv4XbQ&sig2=81oi1NYAxXR3ZHeoHnJMJA] || ytsvetko || [[Compare Esuli and Sebastiani LREC 2006 vs. Esuli and Sebastiani EACL 2006]]<br />
|-<br />
| [[Gilbert_et_al.,_ICWSM_2010]] || [[A Sentiment Detection Engine for Internet Stock Message Boards]] [http://aclweb.org/anthology-new/U/U09/U09-1012.pdf] || nloghman || [[Comparison: Widespread Worry and the Stock Market versus Sentiment Detection Engine for Internet Stock Message Boards]] <br />
|-<br />
| [[Akcora_et_al,_SOMA_2010]] || [[L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006]] [http://www.aaai.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-020.pdf] || zhouyu||[[Compare_Ku_Akcora#Two_Papers]]<br />
|-<br />
| [[Chambers_and_Jurafsky,_Unsupervised_Learning_of_Narrative_Event_Chains,_ACL_2008]] || [[Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations]] [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Chklovski.pdf] || mmahavee ||<br />
|-<br />
| [[BinLu_et_al._ACL2011]] || [[Learning Multilingual Subjective Language via Cross-Lingual Projections]] [http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf] || lingpenk ||[[Compare_BinLu_Rada_Two_Papers]]<br />
|-<br />
| [[Domain-Assisted_Product_Aspect_Hierarchy_Generation:_Towards_Hierarchical_Organization_of_Unstructured_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1583583] || ||<br />
|-<br />
| [[A_Latent_Variable_Model_for_Geographic_Lexical_Variation]] || [[Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW]] [http://dl.acm.org/citation.cfm?id=1135857] || lanzhzh||<br />
|-<br />
| [[Collier_et_al._Journal_of_Biomedical_Semantics_2011]] || [[Modeling Spread of Disease from Social Interaction]] [http://www.cs.rochester.edu/u/kautz/papers/Sadilek-Kautz-Silenzio_Modeling-Spread-of-Disease-from-Social-Interactions_ICWSM-12.pdf] ||rajarshd ||<br />
|-<br />
| [[Capturing_Global_Mood_Levels_using_Blog_Posts]] || [[Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena]] [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2826/3237] || yubink ||<br />
|-<br />
| [[Andreevskaia_et_al.,_ICWSM_2007]] || [[M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection.]] [http://suraj.lums.edu.pk/~cs631s05/Papers/retrieving%20topical%20sentiments%20from%20online%20document%20collection.pdf] || srawat ||<br />
|-<br />
| [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011]] || [[ Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]] [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf] || ysim || [[Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007|Paper comparison]]<br />
|}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=ToWikify&diff=15197ToWikify2012-11-06T03:35:37Z<p>Ysim: </p>
<hr />
<div>{| class="wikitable sortable" border="1" cellpadding="4" cellspacing="0"<br />
|-<br />
! Paper !! Related Paper !! Student Andrew ID !! Link to Comparison<br />
|-<br />
| [[Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks]] || [[Inferring the Diffusion and Evolution of Topics in Social Communities]] [http://www.cs.uiuc.edu/homes/hanj/pdf/snakdd11_clin.pdf] || Bliu1 || [[Compare_Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks_and_Inferring_the_Diffusion_and_Evolution_of_Topics_in_Social_Communities]]<br />
|-<br />
| [[Zheleva_ACM_2009]] || [[Geographic routing in social networks]] [http://www.pnas.org/content/102/33/11623] || ||<br />
|-<br />
| [[Y._Borghol_et_al._Performance_Evaluation_68_2011]] || [[The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]] [http://www.ida.liu.se/~nikca/papers/kdd12.pdf] || tinghuiz || [[Compare Y. Borghol et al. 2011 and The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]]<br />
|-<br />
| [[Zheleva_and_Getoor,_WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || zsheikh ||<br />
|-<br />
| [[Vladimir_Ouzienko,_Prediction_of_Attributes_and_Links_in_Temporal_Social_Networks]] || [[Introduction to stochastic actor-based models for network dynamics]] [http://www.sciencedirect.com/science/article/pii/S0378873309000069] || ||<br />
|-<br />
| [[Miller_et_al_ICWSM_2011]] || [[Can predicate-argument structures be used for contextual opinion retrieval from blogs?]] [http://rd.springer.com/article/10.1007/s11280-012-0170-8] || ||<br />
|-<br />
| [[Ritter_et_al,_EMNLP_2011._Named_Entity_Recognition_in_Tweets:_An_Experimental_Study]] || [[Event discovery in social media feeds]] [http://people.csail.mit.edu/regina/my_papers/twitter_acl2011.pdf] || ||<br />
|-<br />
| [[Ritter_et_al_NAACL_2010._Unsupervised_Modeling_of_Twitter_Conversations]] || [[Catching the drift: Probabilistic content models, with applications to generation and summarization]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Fhlt-naacl2004%2Fmain%2Fpdf%2F167_Paper.pdf&ei=z_iHUMj6MavI0AHbm4HYDg&usg=AFQjCNFvkmshrGjFbst0izxL_4fR6chdiA&sig2=KBi5EDzmBxrd3sTzm5Qyhg] || ||<br />
|-<br />
| [[Modeling_Contagion_Through_Facebook_News_Feed]] || [[Cascading Behavior in Large Blog Graphs]] [http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf] ||thoang ||[[Compare Modeling_Contagion_Through_Facebook_News_Feed and Cascading Behavior in Large Blog Graphs]]<br />
|-<br />
| [[Yeh_et_al_WikiWalk_Random_walks_on_Wikipedia_for_Semantic_Relatedness]] || [[Personalizing PageRank for Word Sense Disambiguation]] [http://www.aclweb.org/anthology/E/E09/E09-1005.pdf] || nnori ||<br />
|-<br />
| [[Yano_et_al_NAACL_2009]] || [[Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs ]] [http://www-poleia.lip6.fr/~gallinar/Enseignement/2009-Papiers-ARI/icwsm2008-nalapati.pdf] || kwmurray ||<br />
|-<br />
| [[Ramage_et_al_ICWSM_2010]] || [[ Is it Really About Me? Message Content in Social Awareness Streams]] [http://dl.acm.org/citation.cfm?id=1718953] || yuchenz || [[Compare_Ramage_Naaman]]<br />
|-<br />
| [[Rodriguez_et_al_Oct_2011]] || [[ The origin of bursts and heavy tails in human dynamics]] [http://nd.edu/~networks/HumanDynamics_20Oct05/HumanDynamics_Nature207,435(2005).pdf] || dzheng ||<br />
|-<br />
| [[Measuring_User_Influence_in_Twitter:_The_Million_Follower_Fallacy]] || [[Influentials, Networks, and Public Opinion Formation]] [ftp://intranet.dei.polimi.it/outgoing/Carlo.Piccardi/VarieDsc/Wa07.pdf] || Lujiang ||<br />
|-<br />
| [[OConnor_et._al.,_ICWSM_2010]] || [[Widespread Worry and the Stock Market]] [http://social.cs.uiuc.edu/people/gilbert/pub/icwsm10.worry.gilbert.pdf] || Gmontane || [[Comparison: O'Connor et al. ICWSM 2010 & Widespread Worry and Stock Market]]<br />
|-<br />
| [[Link_propagation:_A_fast_semi-supervised_learning_algorithm_for_link_prediction]] || [[Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs]] [http://www.springerlink.com/content/g622186787k4258r/] || epapalex || [[Compare Link Propagation Papers]]<br />
|-<br />
| [[Mrinmaya_et._al._WWW%2712]] || [[The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email]] [http://people.cs.umass.edu/~mccallum/papers/art04tr.pdf] || Norii ||<br />
|-<br />
| [[Yano_et_al_ICWSM_2010._What’s_Worthy_of_Comment%3F_Content_and_Comment_Volume_in_Political_Blogs]] || [[Mixed membership models of scientific publication]] [http://www.cs.cmu.edu/~lafferty/pub/efl.pdf] || ymiao ||<br />
|-<br />
| [[Rosen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6843] || rgkulkar || http://malt.ml.cmu.edu/mw/index.php/Comparison_Rosen-Zvi_el_al_and_cohn_et_al<br />
|-<br />
| [[Reviewing_social_media_use_by_clinicians]] || [[Integrating the hospital library with patient care, teaching and research: model and Web 2.0 tools to create a social and collaborative community of clinical research in a hospital setting.]] [http://www.ncbi.nlm.nih.gov/pubmed/20712716?dopt=Abstract] || ||<br />
|-<br />
| [[Agarwal_et_al,_ICWSM_2009#Related_Works_and_Papers]] || [[Latent Friend Mining from Blog Data, ICDM 2006]] [http://dl.acm.org/citation.cfm?id=1193350] || zeyuz || [[Compare_latentfriend_familiarstranger]]<br />
|-<br />
| [[Hassan_et_al,_ICWSM_2009]] || [[Document representation and query expansion models for blog recommendation]] [http://www.cs.cmu.edu/~jaime/ArguelloICWSM08.pdf] || sushantk ||<br />
|-<br />
| [[E.A._Leicht,_Structure_of_Time_Evo_citation_networks_2007]] || [[Detecting Topic Evolution in Scientific Literature: How Can Citations Help?]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Fclgiles.ist.psu.edu%2Fpubs%2FCIKM2009-topic-evolution-citations.pdf&ei=YXCHUOnjDOXH0AHj74HoDQ&usg=AFQjCNHzvuuex1dNYKsuFzbxPIR3y45V5A&sig2=fP-6FsO1Pq4ewLToESR42g] || ziy ||<br />
|-<br />
| [[Birke%26Sarkar,FigLanguages07]] || [[A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006]] [http://acl.ldc.upenn.edu/E/E06/E06-1042.pdf] || tinghaoh ||<br />
|-<br />
| [[A_Discriminative_Latent_Variable_Model_for_SMT]] || [[An End-to-End Discriminative Approach to Machine Translation]] [http://www.seas.upenn.edu/~taskar/pubs/acl06.pdf] ||lingwang || [[Comparative Study of Discriminative Models in SMT]]<br />
|-<br />
| [[Davidov_et_al_COLING_10]] || [[Structured Models for Fine-to-Coarse Sentiment Analysis]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334] ||ydalal || [[Comparative Study : Sentiment Analysis using Automated pattern based appraoch VS Single structured model ]]<br />
|-<br />
| [[Anderson_et_al_KDD2012]] || [[Predicting web searcher satisfaction with existing community-based answers]] [http://www.cs.cmu.edu/~dpelleg/download/sigir311-liu.pdf] ||anikag ||<br />
|-<br />
| [[Leskovec_et_al.,_WWW_2010]] || [[Statistical properties of community structure in large social and information networks. In WWW ’08]] [http://cs-www.cs.yale.edu/homes/mmahoney/pubs/Communities_WWW.pdf] || zhua || [[Compare Leskovec et al. WWW 10 and Leskovec et al. WWW 08]]<br />
|-<br />
| [[Accurate_Unlexicalized_Parsing]] || [[Learning Accurate, Compact, and Interpretable Tree Annotation, S. Petrov, L. Barrett, R. Thibaux, D. Klein, ACL 2006]] [http://acl.ldc.upenn.edu/P/P06/P06-1055.pdf] || ||<br />
|-<br />
| [[Esuli_and_Sebastiani_LREC_2006]] || [[Determining term subjectivity and term orientation for opinion mining.]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_1_esulisebastiani_192.pdf&ei=v3KHUOfBMObs0gHT5IHIBg&usg=AFQjCNGk9-BW40FOkzPKLtVyb8a7Dv4XbQ&sig2=81oi1NYAxXR3ZHeoHnJMJA] || ytsvetko || [[Compare Esuli and Sebastiani LREC 2006 vs. Esuli and Sebastiani EACL 2006]]<br />
|-<br />
| [[Gilbert_et_al.,_ICWSM_2010]] || [[A Sentiment Detection Engine for Internet Stock Message Boards]] [http://aclweb.org/anthology-new/U/U09/U09-1012.pdf] || nloghman || [[Comparison: Widespread Worry and the Stock Market versus Sentiment Detection Engine for Internet Stock Message Boards]] <br />
|-<br />
| [[Akcora_et_al,_SOMA_2010]] || [[L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006]] [http://www.aaai.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-020.pdf] || zhouyu||[[Compare_Ku_Akcora#Two_Papers]]<br />
|-<br />
| [[Chambers_and_Jurafsky,_Unsupervised_Learning_of_Narrative_Event_Chains,_ACL_2008]] || [[Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations]] [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Chklovski.pdf] || mmahavee ||<br />
|-<br />
| [[BinLu_et_al._ACL2011]] || [[Learning Multilingual Subjective Language via Cross-Lingual Projections]] [http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf] || lingpenk ||[[Compare_BinLu_Rada_Two_Papers]]<br />
|-<br />
| [[Domain-Assisted_Product_Aspect_Hierarchy_Generation:_Towards_Hierarchical_Organization_of_Unstructured_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1583583] || ||<br />
|-<br />
| [[A_Latent_Variable_Model_for_Geographic_Lexical_Variation]] || [[Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW]] [http://dl.acm.org/citation.cfm?id=1135857] || lanzhzh||<br />
|-<br />
| [[Collier_et_al._Journal_of_Biomedical_Semantics_2011]] || [[Modeling Spread of Disease from Social Interaction]] [http://www.cs.rochester.edu/u/kautz/papers/Sadilek-Kautz-Silenzio_Modeling-Spread-of-Disease-from-Social-Interactions_ICWSM-12.pdf] ||rajarshd ||<br />
|-<br />
| [[Capturing_Global_Mood_Levels_using_Blog_Posts]] || [[Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena]] [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2826/3237] || yubink ||<br />
|-<br />
| [[Andreevskaia_et_al.,_ICWSM_2007]] || [[M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection.]] [http://suraj.lums.edu.pk/~cs631s05/Papers/retrieving%20topical%20sentiments%20from%20online%20document%20collection.pdf] || srawat ||<br />
|-<br />
| [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011]] || [[ Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]] [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf] || ysim || [[Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007]] ||<br />
|}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=ToWikify&diff=15196ToWikify2012-11-06T03:35:21Z<p>Ysim: </p>
<hr />
<div>{| class="wikitable sortable" border="1" cellpadding="4" cellspacing="0"<br />
|-<br />
! Paper !! Related Paper !! Student Andrew ID !! Link to Comparison<br />
|-<br />
| [[Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks]] || [[Inferring the Diffusion and Evolution of Topics in Social Communities]] [http://www.cs.uiuc.edu/homes/hanj/pdf/snakdd11_clin.pdf] || Bliu1 || [[Compare_Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks_and_Inferring_the_Diffusion_and_Evolution_of_Topics_in_Social_Communities]]<br />
|-<br />
| [[Zheleva_ACM_2009]] || [[Geographic routing in social networks]] [http://www.pnas.org/content/102/33/11623] || ||<br />
|-<br />
| [[Y._Borghol_et_al._Performance_Evaluation_68_2011]] || [[The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]] [http://www.ida.liu.se/~nikca/papers/kdd12.pdf] || tinghuiz || [[Compare Y. Borghol et al. 2011 and The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]]<br />
|-<br />
| [[Zheleva_and_Getoor,_WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || zsheikh ||<br />
|-<br />
| [[Vladimir_Ouzienko,_Prediction_of_Attributes_and_Links_in_Temporal_Social_Networks]] || [[Introduction to stochastic actor-based models for network dynamics]] [http://www.sciencedirect.com/science/article/pii/S0378873309000069] || ||<br />
|-<br />
| [[Miller_et_al_ICWSM_2011]] || [[Can predicate-argument structures be used for contextual opinion retrieval from blogs?]] [http://rd.springer.com/article/10.1007/s11280-012-0170-8] || ||<br />
|-<br />
| [[Ritter_et_al,_EMNLP_2011._Named_Entity_Recognition_in_Tweets:_An_Experimental_Study]] || [[Event discovery in social media feeds]] [http://people.csail.mit.edu/regina/my_papers/twitter_acl2011.pdf] || ||<br />
|-<br />
| [[Ritter_et_al_NAACL_2010._Unsupervised_Modeling_of_Twitter_Conversations]] || [[Catching the drift: Probabilistic content models, with applications to generation and summarization]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Fhlt-naacl2004%2Fmain%2Fpdf%2F167_Paper.pdf&ei=z_iHUMj6MavI0AHbm4HYDg&usg=AFQjCNFvkmshrGjFbst0izxL_4fR6chdiA&sig2=KBi5EDzmBxrd3sTzm5Qyhg] || ||<br />
|-<br />
| [[Modeling_Contagion_Through_Facebook_News_Feed]] || [[Cascading Behavior in Large Blog Graphs]] [http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf] ||thoang ||[[Compare Modeling_Contagion_Through_Facebook_News_Feed and Cascading Behavior in Large Blog Graphs]]<br />
|-<br />
| [[Yeh_et_al_WikiWalk_Random_walks_on_Wikipedia_for_Semantic_Relatedness]] || [[Personalizing PageRank for Word Sense Disambiguation]] [http://www.aclweb.org/anthology/E/E09/E09-1005.pdf] || nnori ||<br />
|-<br />
| [[Yano_et_al_NAACL_2009]] || [[Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs ]] [http://www-poleia.lip6.fr/~gallinar/Enseignement/2009-Papiers-ARI/icwsm2008-nalapati.pdf] || kwmurray ||<br />
|-<br />
| [[Ramage_et_al_ICWSM_2010]] || [[ Is it Really About Me? Message Content in Social Awareness Streams]] [http://dl.acm.org/citation.cfm?id=1718953] || yuchenz || [[Compare_Ramage_Naaman]]<br />
|-<br />
| [[Rodriguez_et_al_Oct_2011]] || [[ The origin of bursts and heavy tails in human dynamics]] [http://nd.edu/~networks/HumanDynamics_20Oct05/HumanDynamics_Nature207,435(2005).pdf] || dzheng ||<br />
|-<br />
| [[Measuring_User_Influence_in_Twitter:_The_Million_Follower_Fallacy]] || [[Influentials, Networks, and Public Opinion Formation]] [ftp://intranet.dei.polimi.it/outgoing/Carlo.Piccardi/VarieDsc/Wa07.pdf] || Lujiang ||<br />
|-<br />
| [[OConnor_et._al.,_ICWSM_2010]] || [[Widespread Worry and the Stock Market]] [http://social.cs.uiuc.edu/people/gilbert/pub/icwsm10.worry.gilbert.pdf] || Gmontane || [[Comparison: O'Connor et al. ICWSM 2010 & Widespread Worry and Stock Market]]<br />
|-<br />
| [[Link_propagation:_A_fast_semi-supervised_learning_algorithm_for_link_prediction]] || [[Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs]] [http://www.springerlink.com/content/g622186787k4258r/] || epapalex || [[Compare Link Propagation Papers]]<br />
|-<br />
| [[Mrinmaya_et._al._WWW%2712]] || [[The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email]] [http://people.cs.umass.edu/~mccallum/papers/art04tr.pdf] || Norii ||<br />
|-<br />
| [[Yano_et_al_ICWSM_2010._What’s_Worthy_of_Comment%3F_Content_and_Comment_Volume_in_Political_Blogs]] || [[Mixed membership models of scientific publication]] [http://www.cs.cmu.edu/~lafferty/pub/efl.pdf] || ymiao ||<br />
|-<br />
| [[Rosen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6843] || rgkulkar || http://malt.ml.cmu.edu/mw/index.php/Comparison_Rosen-Zvi_el_al_and_cohn_et_al<br />
|-<br />
| [[Reviewing_social_media_use_by_clinicians]] || [[Integrating the hospital library with patient care, teaching and research: model and Web 2.0 tools to create a social and collaborative community of clinical research in a hospital setting.]] [http://www.ncbi.nlm.nih.gov/pubmed/20712716?dopt=Abstract] || ||<br />
|-<br />
| [[Agarwal_et_al,_ICWSM_2009#Related_Works_and_Papers]] || [[Latent Friend Mining from Blog Data, ICDM 2006]] [http://dl.acm.org/citation.cfm?id=1193350] || zeyuz || [[Compare_latentfriend_familiarstranger]]<br />
|-<br />
| [[Hassan_et_al,_ICWSM_2009]] || [[Document representation and query expansion models for blog recommendation]] [http://www.cs.cmu.edu/~jaime/ArguelloICWSM08.pdf] || sushantk ||<br />
|-<br />
| [[E.A._Leicht,_Structure_of_Time_Evo_citation_networks_2007]] || [[Detecting Topic Evolution in Scientific Literature: How Can Citations Help?]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Fclgiles.ist.psu.edu%2Fpubs%2FCIKM2009-topic-evolution-citations.pdf&ei=YXCHUOnjDOXH0AHj74HoDQ&usg=AFQjCNHzvuuex1dNYKsuFzbxPIR3y45V5A&sig2=fP-6FsO1Pq4ewLToESR42g] || ziy ||<br />
|-<br />
| [[Birke%26Sarkar,FigLanguages07]] || [[A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006]] [http://acl.ldc.upenn.edu/E/E06/E06-1042.pdf] || tinghaoh ||<br />
|-<br />
| [[A_Discriminative_Latent_Variable_Model_for_SMT]] || [[An End-to-End Discriminative Approach to Machine Translation]] [http://www.seas.upenn.edu/~taskar/pubs/acl06.pdf] ||lingwang || [[Comparative Study of Discriminative Models in SMT]]<br />
|-<br />
| [[Davidov_et_al_COLING_10]] || [[Structured Models for Fine-to-Coarse Sentiment Analysis]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334] ||ydalal || [[Comparative Study : Sentiment Analysis using Automated pattern based appraoch VS Single structured model ]]<br />
|-<br />
| [[Anderson_et_al_KDD2012]] || [[Predicting web searcher satisfaction with existing community-based answers]] [http://www.cs.cmu.edu/~dpelleg/download/sigir311-liu.pdf] ||anikag ||<br />
|-<br />
| [[Leskovec_et_al.,_WWW_2010]] || [[Statistical properties of community structure in large social and information networks. In WWW ’08]] [http://cs-www.cs.yale.edu/homes/mmahoney/pubs/Communities_WWW.pdf] || zhua || [[Compare Leskovec et al. WWW 10 and Leskovec et al. WWW 08]]<br />
|-<br />
| [[Accurate_Unlexicalized_Parsing]] || [[Learning Accurate, Compact, and Interpretable Tree Annotation, S. Petrov, L. Barrett, R. Thibaux, D. Klein, ACL 2006]] [http://acl.ldc.upenn.edu/P/P06/P06-1055.pdf] || ||<br />
|-<br />
| [[Esuli_and_Sebastiani_LREC_2006]] || [[Determining term subjectivity and term orientation for opinion mining.]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_1_esulisebastiani_192.pdf&ei=v3KHUOfBMObs0gHT5IHIBg&usg=AFQjCNGk9-BW40FOkzPKLtVyb8a7Dv4XbQ&sig2=81oi1NYAxXR3ZHeoHnJMJA] || ytsvetko || [[Compare Esuli and Sebastiani LREC 2006 vs. Esuli and Sebastiani EACL 2006]]<br />
|-<br />
| [[Gilbert_et_al.,_ICWSM_2010]] || [[A Sentiment Detection Engine for Internet Stock Message Boards]] [http://aclweb.org/anthology-new/U/U09/U09-1012.pdf] || nloghman || [[Comparison: Widespread Worry and the Stock Market versus Sentiment Detection Engine for Internet Stock Message Boards]] <br />
|-<br />
| [[Akcora_et_al,_SOMA_2010]] || [[L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006]] [http://www.aaai.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-020.pdf] || zhouyu||[[Compare_Ku_Akcora#Two_Papers]]<br />
|-<br />
| [[Chambers_and_Jurafsky,_Unsupervised_Learning_of_Narrative_Event_Chains,_ACL_2008]] || [[Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations]] [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Chklovski.pdf] || mmahavee ||<br />
|-<br />
| [[BinLu_et_al._ACL2011]] || [[Learning Multilingual Subjective Language via Cross-Lingual Projections]] [http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf] || lingpenk ||[[Compare_BinLu_Rada_Two_Papers]]<br />
|-<br />
| [[Domain-Assisted_Product_Aspect_Hierarchy_Generation:_Towards_Hierarchical_Organization_of_Unstructured_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1583583] || ||<br />
|-<br />
| [[A_Latent_Variable_Model_for_Geographic_Lexical_Variation]] || [[Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW]] [http://dl.acm.org/citation.cfm?id=1135857] || lanzhzh||<br />
|-<br />
| [[Collier_et_al._Journal_of_Biomedical_Semantics_2011]] || [[Modeling Spread of Disease from Social Interaction]] [http://www.cs.rochester.edu/u/kautz/papers/Sadilek-Kautz-Silenzio_Modeling-Spread-of-Disease-from-Social-Interactions_ICWSM-12.pdf] ||rajarshd ||<br />
|-<br />
| [[Capturing_Global_Mood_Levels_using_Blog_Posts]] || [[Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena]] [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2826/3237] || yubink ||<br />
|-<br />
| [[Andreevskaia_et_al.,_ICWSM_2007]] || [[M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection.]] [http://suraj.lums.edu.pk/~cs631s05/Papers/retrieving%20topical%20sentiments%20from%20online%20document%20collection.pdf] || srawat ||<br />
|-<br />
| [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011]] || [[ Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]] [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf] || ysim || [[Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007]]<br />
|-<br />
|}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=ToWikify&diff=15194ToWikify2012-11-06T03:34:59Z<p>Ysim: </p>
<hr />
<div>{| class="wikitable sortable" border="1" cellpadding="4" cellspacing="0"<br />
|-<br />
! Paper !! Related Paper !! Student Andrew ID !! Link to Comparison<br />
|-<br />
| [[Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks]] || [[Inferring the Diffusion and Evolution of Topics in Social Communities]] [http://www.cs.uiuc.edu/homes/hanj/pdf/snakdd11_clin.pdf] || Bliu1 || [[Compare_Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks_and_Inferring_the_Diffusion_and_Evolution_of_Topics_in_Social_Communities]]<br />
|-<br />
| [[Zheleva_ACM_2009]] || [[Geographic routing in social networks]] [http://www.pnas.org/content/102/33/11623] || ||<br />
|-<br />
| [[Y._Borghol_et_al._Performance_Evaluation_68_2011]] || [[The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]] [http://www.ida.liu.se/~nikca/papers/kdd12.pdf] || tinghuiz || [[Compare Y. Borghol et al. 2011 and The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]]<br />
|-<br />
| [[Zheleva_and_Getoor,_WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || zsheikh ||<br />
|-<br />
| [[Vladimir_Ouzienko,_Prediction_of_Attributes_and_Links_in_Temporal_Social_Networks]] || [[Introduction to stochastic actor-based models for network dynamics]] [http://www.sciencedirect.com/science/article/pii/S0378873309000069] || ||<br />
|-<br />
| [[Miller_et_al_ICWSM_2011]] || [[Can predicate-argument structures be used for contextual opinion retrieval from blogs?]] [http://rd.springer.com/article/10.1007/s11280-012-0170-8] || ||<br />
|-<br />
| [[Ritter_et_al,_EMNLP_2011._Named_Entity_Recognition_in_Tweets:_An_Experimental_Study]] || [[Event discovery in social media feeds]] [http://people.csail.mit.edu/regina/my_papers/twitter_acl2011.pdf] || ||<br />
|-<br />
| [[Ritter_et_al_NAACL_2010._Unsupervised_Modeling_of_Twitter_Conversations]] || [[Catching the drift: Probabilistic content models, with applications to generation and summarization]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Fhlt-naacl2004%2Fmain%2Fpdf%2F167_Paper.pdf&ei=z_iHUMj6MavI0AHbm4HYDg&usg=AFQjCNFvkmshrGjFbst0izxL_4fR6chdiA&sig2=KBi5EDzmBxrd3sTzm5Qyhg] || ||<br />
|-<br />
| [[Modeling_Contagion_Through_Facebook_News_Feed]] || [[Cascading Behavior in Large Blog Graphs]] [http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf] ||thoang ||[[Compare Modeling_Contagion_Through_Facebook_News_Feed and Cascading Behavior in Large Blog Graphs]]<br />
|-<br />
| [[Yeh_et_al_WikiWalk_Random_walks_on_Wikipedia_for_Semantic_Relatedness]] || [[Personalizing PageRank for Word Sense Disambiguation]] [http://www.aclweb.org/anthology/E/E09/E09-1005.pdf] || nnori ||<br />
|-<br />
| [[Yano_et_al_NAACL_2009]] || [[Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs ]] [http://www-poleia.lip6.fr/~gallinar/Enseignement/2009-Papiers-ARI/icwsm2008-nalapati.pdf] || kwmurray ||<br />
|-<br />
| [[Ramage_et_al_ICWSM_2010]] || [[ Is it Really About Me? Message Content in Social Awareness Streams]] [http://dl.acm.org/citation.cfm?id=1718953] || yuchenz || [[Compare_Ramage_Naaman]]<br />
|-<br />
| [[Rodriguez_et_al_Oct_2011]] || [[ The origin of bursts and heavy tails in human dynamics]] [http://nd.edu/~networks/HumanDynamics_20Oct05/HumanDynamics_Nature207,435(2005).pdf] || dzheng ||<br />
|-<br />
| [[Measuring_User_Influence_in_Twitter:_The_Million_Follower_Fallacy]] || [[Influentials, Networks, and Public Opinion Formation]] [ftp://intranet.dei.polimi.it/outgoing/Carlo.Piccardi/VarieDsc/Wa07.pdf] || Lujiang ||<br />
|-<br />
| [[OConnor_et._al.,_ICWSM_2010]] || [[Widespread Worry and the Stock Market]] [http://social.cs.uiuc.edu/people/gilbert/pub/icwsm10.worry.gilbert.pdf] || Gmontane || [[Comparison: O'Connor et al. ICWSM 2010 & Widespread Worry and Stock Market]]<br />
|-<br />
| [[Link_propagation:_A_fast_semi-supervised_learning_algorithm_for_link_prediction]] || [[Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs]] [http://www.springerlink.com/content/g622186787k4258r/] || epapalex || [[Compare Link Propagation Papers]]<br />
|-<br />
| [[Mrinmaya_et._al._WWW%2712]] || [[The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email]] [http://people.cs.umass.edu/~mccallum/papers/art04tr.pdf] || Norii ||<br />
|-<br />
| [[Yano_et_al_ICWSM_2010._What’s_Worthy_of_Comment%3F_Content_and_Comment_Volume_in_Political_Blogs]] || [[Mixed membership models of scientific publication]] [http://www.cs.cmu.edu/~lafferty/pub/efl.pdf] || ymiao ||<br />
|-<br />
| [[Rosen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6843] || rgkulkar || http://malt.ml.cmu.edu/mw/index.php/Comparison_Rosen-Zvi_el_al_and_cohn_et_al<br />
|-<br />
| [[Reviewing_social_media_use_by_clinicians]] || [[Integrating the hospital library with patient care, teaching and research: model and Web 2.0 tools to create a social and collaborative community of clinical research in a hospital setting.]] [http://www.ncbi.nlm.nih.gov/pubmed/20712716?dopt=Abstract] || ||<br />
|-<br />
| [[Agarwal_et_al,_ICWSM_2009#Related_Works_and_Papers]] || [[Latent Friend Mining from Blog Data, ICDM 2006]] [http://dl.acm.org/citation.cfm?id=1193350] || zeyuz || [[Compare_latentfriend_familiarstranger]]<br />
|-<br />
| [[Hassan_et_al,_ICWSM_2009]] || [[Document representation and query expansion models for blog recommendation]] [http://www.cs.cmu.edu/~jaime/ArguelloICWSM08.pdf] || sushantk ||<br />
|-<br />
| [[E.A._Leicht,_Structure_of_Time_Evo_citation_networks_2007]] || [[Detecting Topic Evolution in Scientific Literature: How Can Citations Help?]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Fclgiles.ist.psu.edu%2Fpubs%2FCIKM2009-topic-evolution-citations.pdf&ei=YXCHUOnjDOXH0AHj74HoDQ&usg=AFQjCNHzvuuex1dNYKsuFzbxPIR3y45V5A&sig2=fP-6FsO1Pq4ewLToESR42g] || ziy ||<br />
|-<br />
| [[Birke%26Sarkar,FigLanguages07]] || [[A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006]] [http://acl.ldc.upenn.edu/E/E06/E06-1042.pdf] || tinghaoh ||<br />
|-<br />
| [[A_Discriminative_Latent_Variable_Model_for_SMT]] || [[An End-to-End Discriminative Approach to Machine Translation]] [http://www.seas.upenn.edu/~taskar/pubs/acl06.pdf] ||lingwang || [[Comparative Study of Discriminative Models in SMT]]<br />
|-<br />
| [[Davidov_et_al_COLING_10]] || [[Structured Models for Fine-to-Coarse Sentiment Analysis]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334] ||ydalal || [[Comparative Study : Sentiment Analysis using Automated pattern based appraoch VS Single structured model ]]<br />
|-<br />
| [[Anderson_et_al_KDD2012]] || [[Predicting web searcher satisfaction with existing community-based answers]] [http://www.cs.cmu.edu/~dpelleg/download/sigir311-liu.pdf] ||anikag ||<br />
|-<br />
| [[Leskovec_et_al.,_WWW_2010]] || [[Statistical properties of community structure in large social and information networks. In WWW ’08]] [http://cs-www.cs.yale.edu/homes/mmahoney/pubs/Communities_WWW.pdf] || zhua || [[Compare Leskovec et al. WWW 10 and Leskovec et al. WWW 08]]<br />
|-<br />
| [[Accurate_Unlexicalized_Parsing]] || [[Learning Accurate, Compact, and Interpretable Tree Annotation, S. Petrov, L. Barrett, R. Thibaux, D. Klein, ACL 2006]] [http://acl.ldc.upenn.edu/P/P06/P06-1055.pdf] || ||<br />
|-<br />
| [[Esuli_and_Sebastiani_LREC_2006]] || [[Determining term subjectivity and term orientation for opinion mining.]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_1_esulisebastiani_192.pdf&ei=v3KHUOfBMObs0gHT5IHIBg&usg=AFQjCNGk9-BW40FOkzPKLtVyb8a7Dv4XbQ&sig2=81oi1NYAxXR3ZHeoHnJMJA] || ytsvetko || [[Compare Esuli and Sebastiani LREC 2006 vs. Esuli and Sebastiani EACL 2006]]<br />
|-<br />
| [[Gilbert_et_al.,_ICWSM_2010]] || [[A Sentiment Detection Engine for Internet Stock Message Boards]] [http://aclweb.org/anthology-new/U/U09/U09-1012.pdf] || nloghman || [[Comparison: Widespread Worry and the Stock Market versus Sentiment Detection Engine for Internet Stock Message Boards]] <br />
|-<br />
| [[Akcora_et_al,_SOMA_2010]] || [[L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006]] [http://www.aaai.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-020.pdf] || zhouyu||[[Compare_Ku_Akcora#Two_Papers]]<br />
|-<br />
| [[Chambers_and_Jurafsky,_Unsupervised_Learning_of_Narrative_Event_Chains,_ACL_2008]] || [[Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations]] [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Chklovski.pdf] || mmahavee ||<br />
|-<br />
| [[BinLu_et_al._ACL2011]] || [[Learning Multilingual Subjective Language via Cross-Lingual Projections]] [http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf] || lingpenk ||[[Compare_BinLu_Rada_Two_Papers]]<br />
|-<br />
| [[Domain-Assisted_Product_Aspect_Hierarchy_Generation:_Towards_Hierarchical_Organization_of_Unstructured_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1583583] || ||<br />
|-<br />
| [[A_Latent_Variable_Model_for_Geographic_Lexical_Variation]] || [[Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW]] [http://dl.acm.org/citation.cfm?id=1135857] || lanzhzh||<br />
|-<br />
| [[Collier_et_al._Journal_of_Biomedical_Semantics_2011]] || [[Modeling Spread of Disease from Social Interaction]] [http://www.cs.rochester.edu/u/kautz/papers/Sadilek-Kautz-Silenzio_Modeling-Spread-of-Disease-from-Social-Interactions_ICWSM-12.pdf] ||rajarshd ||<br />
|-<br />
| [[Capturing_Global_Mood_Levels_using_Blog_Posts]] || [[Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena]] [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2826/3237] || yubink ||<br />
|-<br />
| [[Andreevskaia_et_al.,_ICWSM_2007]] || [[M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection.]] [http://suraj.lums.edu.pk/~cs631s05/Papers/retrieving%20topical%20sentiments%20from%20online%20document%20collection.pdf] || srawat ||<br />
|-<br />
| [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011]] || [[ Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]] [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf] || ysim || [[Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007]]<br />
|}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15192Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:34:25Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the ``social'' aspect, where an event is being talked about more than usual by ``social actors''.<br />
<br />
Method-wise, both papers framed the problem of identifying relationships in the context of graphs.<br />
In Das et al, vertices are entities and edges describe how much overlap two entities have in the time periods that they are bursty. So two entities who were mentioned more at the same time would have stronger edges between them.<br />
In Zhao et al, vertices are social actors. Social actors are not entities that are directly involved in an event (much unlike Das et al), they are just actors that converse (through text) about the event that is taking place. Edges between social actors are thus weighted by how intense pairs social actors communicate during the time period.<br />
<br />
In Das et al's approach, events are thus assumed to be associated with two or more public entities, while Zhao et al's event are more associated with the topical nature of the discussions that are going on.<br />
The advantage of Das et al's approach is that events are easily interpretable, especially within the context of public news (entertainment news, political news, etc), which is often about specific public figures or organizations. However, it would not be able to capture abstract events, that do not have specific associated entities, say a natural disaster, where there is no specific entity it is associated with.<br />
Zhao et al's approach, on the other hand, would be able to identify such abstract events, however, their event topics may not be easily identifable.<br />
<br />
Both papers made use of algorithms from time series models and graph clustering to solve their respective problems.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15178Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:23:32Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Das et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
In Das et al, their task was to first discover pairs of entities that were co-bursting in the same time period (of a week). Co-bursting means that both entities are mentioned significantly more than during other time periods.<br />
After which, the next step is to discover the relationships between such entities. <br />
This forms the foundation for an event, an n-ary relationship between entities that are bursty at the same time period.<br />
<br />
Likewise, Zhao et al's task is to discover events, exploiting the temporal burstiness property of entities and text, and also the ``social'' aspect, where an event is being talked about more than usual by ``social actors''.<br />
<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15164Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:19:15Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Anish et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]].<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15163Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:18:55Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Anish Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
On a high level, both papers are interested in discovering events from large amount temporal information sources.<br />
Both of them leverage on user generated content, with Anish et al using Wikipedia as their dataset, while Zhao et al used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos political blogs]].<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Q._Zhao,_P._Mitra,_and_B._Chen._Temporal_and_information_flow_based_event_detection_from_social_text_streams._In_AAAI,_2007&diff=15157Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 20072012-11-06T03:15:52Z<p>Ysim: </p>
<hr />
<div>This [[Category::paper]] is relevant to [[AddressesProblem::Controversial_events_detection|detecting controversial events]] and [[AddressesProblem::Event detection]].<br />
<br />
= Temporal and information flow based event detection from social text streams=<br />
<br />
== Citation ==<br />
<br />
Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.<br />
<br />
== Online version ==<br />
<br />
[http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf Temporal and information flow based event detection from social text streams]<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15154Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:14:53Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
<br />
== Papers ==<br />
<br />
The papers are<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
* A. Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15153Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:14:33Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
The papers are<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
* A. Das Sarma, A. Jain, C. Yu. [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011|Dynamic relationship and event discovery]]. In Proceedings of the fourth ACM international conference on Web search and data mining, 2011. [http://web.eecs.umich.edu/~congy/work/wsdm11.pdf]<br />
<br />
== Comparative analysis of both papers ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15146Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:13:03Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
The papers are<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07|Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
<br />
<br />
== Citation ==<br />
<br />
<br />
== Online version ==<br />
<br />
<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15145Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:12:50Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
The papers are<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [[Zhao_et_al,_AAAI_07 Temporal and information flow based event detection from social text streams]]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
<br />
<br />
== Citation ==<br />
<br />
<br />
== Online version ==<br />
<br />
<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15144Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:12:40Z<p>Ysim: </p>
<hr />
<div>This is a comparison of two related papers in [[event detection]] and [[temporal information extraction]].<br />
The papers are<br />
* Qiankun Zhao, Prasenjit Mitra, and Bi Chen. [Zhao_et_al,_AAAI_07 Temporal and information flow based event detection from social text streams]. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007. [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf]<br />
<br />
<br />
<br />
== Citation ==<br />
<br />
<br />
== Online version ==<br />
<br />
<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Comparison_Das_et_al_WSDM_2011_and_Zhao_et_al_AAAI_2007&diff=15137Comparison Das et al WSDM 2011 and Zhao et al AAAI 20072012-11-06T03:10:19Z<p>Ysim: Created page with 'Hello World'</p>
<hr />
<div>Hello World</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Zhao_et_al,_AAAI_07&diff=15101Zhao et al, AAAI 072012-11-06T02:43:04Z<p>Ysim: Redirected page to Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007</p>
<hr />
<div>#REDIRECT [[Q._Zhao,_P._Mitra,_and_B._Chen._Temporal_and_information_flow_based_event_detection_from_social_text_streams._In_AAAI,_2007]]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Q._Zhao,_P._Mitra,_and_B._Chen._Temporal_and_information_flow_based_event_detection_from_social_text_streams._In_AAAI,_2007&diff=15100Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 20072012-11-06T02:42:26Z<p>Ysim: Created page with 'This Category::paper is relevant to detecting controversial events. = Temporal and information flow based event detectio…'</p>
<hr />
<div>This [[Category::paper]] is relevant to [[AddressesProblem::Controversial_events_detection|detecting controversial events]].<br />
<br />
= Temporal and information flow based event detection from social text streams=<br />
<br />
== Citation ==<br />
<br />
Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501–1506. AAAI Press, 2007.<br />
<br />
== Online version ==<br />
<br />
[http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf Temporal and information flow based event detection from social text streams]<br />
<br />
== Summary ==<br />
<br />
The authors presents a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
Social text streams are represented as multigraphs where each node denote an "actor" and an edge represents the information flow between two actors.<br />
First, the authors did content based [[UsesMethod::clustering]] using a vector space model (tf-idf weights, cosine similarity, the works) and graph cut based clustering algorithm.<br />
This clustering segments their data into topics.<br />
<br />
For a given topic, they measure the "intensities" over time using a sliding time window and segment them into intervals using an adaptive time series model.<br />
With the temporal segmentation, each topic is represented as a sequence of social network graphs over time.<br />
The weight of edges between different actors in this graph denote their communication intensity, and one can measure the "information flow" between actors for a given topic over time.<br />
<br />
With the above content, temporal and information flow data, they extract events by extracting text segments subject to constraints on these information. For instance, an event should be from the same time interval, be about the same topics and mainly between a certain sub group of social actors.<br />
<br />
== Evaluation ==<br />
<br />
They used the [[UsesDataset::Enron email corpus]] and [[UsesDataset::Dailykos blogs]] [http://www.dailykos.com/]. 30 events are manually labeled as ground truth in the dataset by looking for correspondance with real world news.<br />
<br />
Performance is measured using precision/recall/fscore of how well events are recovered with their model. <br />
<br />
== Discussion ==<br />
They found that taking temporal and social dimensions into account can increase their f-score significantly. Their approach of integrating these diverse features together in a step-wise manner was also found to perform better than just including features in a standard machine learning framework.<br />
<br />
== Related papers ==<br />
There has been a lot of work on event detection.<br />
* [[RelatedPaper::Lin_et_al_KDD_2011|A Statistical Model for Popular Events Tracking in Social Communities. Lin et al, KDD 2011]] This paper address a method to observe and track the popular events or topics that evolve over time in the communities.<br />
* [[RelatedPaper::Popescu and Pennacchiotti, CIKM 10|Detecting controversial events from Twitter. Popescu and Pennacchiotti, CIKM 10]] This paper addresses the task of identifying controversial events using Twitter as a starting point.<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories.<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events]] This paper aims at detecting and classifying social events using Tree kernels.<br />
<br />
== Study plan ==<br />
* Article: Adaptive time series model [http://www.siam.org/proceedings/datamining/2007/dm07_059Lemire.pdf]<br />
* Graph cut based clustering [http://www.cs.berkeley.edu/~malik/papers/SM-ncut.pdf]</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=ToWikify&diff=14785ToWikify2012-10-26T21:20:42Z<p>Ysim: </p>
<hr />
<div>{| class="wikitable sortable" border="1" cellpadding="4" cellspacing="0"<br />
|-<br />
! Paper !! Related Paper !! Student Andrew ID !! Link to Comparison<br />
|-<br />
| [[Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks]] || [[Inferring the Diffusion and Evolution of Topics in Social Communities]] [http://www.cs.uiuc.edu/homes/hanj/pdf/snakdd11_clin.pdf] || ||<br />
|-<br />
| [[Zheleva_ACM_2009]] || [[Geographic routing in social networks]] [http://www.pnas.org/content/102/33/11623] || ||<br />
|-<br />
| [[Y._Borghol_et_al._Performance_Evaluation_68_2011]] || [[The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]] [http://www.ida.liu.se/~nikca/papers/kdd12.pdf] || ||<br />
|-<br />
| [[Zheleva_and_Getoor,_WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || ||<br />
|-<br />
| [[Vladimir_Ouzienko,_Prediction_of_Attributes_and_Links_in_Temporal_Social_Networks]] || [[Introduction to stochastic actor-based models for network dynamics]] [http://www.sciencedirect.com/science/article/pii/S0378873309000069] || ||<br />
|-<br />
| [[Miller_et_al_ICWSM_2011]] || [[Can predicate-argument structures be used for contextual opinion retrieval from blogs?]] [http://rd.springer.com/article/10.1007/s11280-012-0170-8] || ||<br />
|-<br />
| [[Ritter_et_al,_EMNLP_2011._Named_Entity_Recognition_in_Tweets:_An_Experimental_Study]] || [[Event discovery in social media feeds]] [http://people.csail.mit.edu/regina/my_papers/twitter_acl2011.pdf] || ||<br />
|-<br />
| [[Ritter_et_al_NAACL_2010._Unsupervised_Modeling_of_Twitter_Conversations]] || [[Catching the drift: Probabilistic content models, with applications to generation and summarization]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Fhlt-naacl2004%2Fmain%2Fpdf%2F167_Paper.pdf&ei=z_iHUMj6MavI0AHbm4HYDg&usg=AFQjCNFvkmshrGjFbst0izxL_4fR6chdiA&sig2=KBi5EDzmBxrd3sTzm5Qyhg] || ||<br />
|-<br />
| [[Modeling_Contagion_Through_Facebook_News_Feed]] || [[Cascading Behavior in Large Blog Graphs]] [http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf] ||thoang ||<br />
|-<br />
| [[Yeh_et_al_WikiWalk_Random_walks_on_Wikipedia_for_Semantic_Relatedness]] || [[Personalizing PageRank for Word Sense Disambiguation]] [http://www.aclweb.org/anthology/E/E09/E09-1005.pdf] || nnori ||<br />
|-<br />
| [[Yano_et_al_NAACL_2009]] || [[Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs ]] [http://www-poleia.lip6.fr/~gallinar/Enseignement/2009-Papiers-ARI/icwsm2008-nalapati.pdf] || ||<br />
|-<br />
| [[Ramage_et_al_ICWSM_2010]] || [[ Is it Really About Me? Message Content in Social Awareness Streams]] [http://dl.acm.org/citation.cfm?id=1718953] || ||<br />
|-<br />
| [[Rodriguez_et_al_Oct_2011]] || [[ The origin of bursts and heavy tails in human dynamics]] [http://nd.edu/~networks/HumanDynamics_20Oct05/HumanDynamics_Nature207,435(2005).pdf] || ||<br />
|-<br />
| [[Measuring_User_Influence_in_Twitter:_The_Million_Follower_Fallacy]] || [[Influentials, Networks, and Public Opinion Formation]] [ftp://intranet.dei.polimi.it/outgoing/Carlo.Piccardi/VarieDsc/Wa07.pdf] || ||<br />
|-<br />
| [[OConnor_et._al.,_ICWSM_2010]] || [[Widespread Worry and the Stock Market]] [http://social.cs.uiuc.edu/people/gilbert/pub/icwsm10.worry.gilbert.pdf] || Gmontane ||<br />
|-<br />
| [[Link_propagation:_A_fast_semi-supervised_learning_algorithm_for_link_prediction]] || [[Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs]] [http://www.springerlink.com/content/g622186787k4258r/] || ||<br />
|-<br />
| [[Mrinmaya_et._al._WWW%2712]] || [[The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email]] [http://people.cs.umass.edu/~mccallum/papers/art04tr.pdf] || ||<br />
|-<br />
| [[Yano_et_al_ICWSM_2010._What’s_Worthy_of_Comment%3F_Content_and_Comment_Volume_in_Political_Blogs]] || [[Mixed membership models of scientific publication]] [http://www.cs.cmu.edu/~lafferty/pub/efl.pdf] || ||<br />
|-<br />
| [[Rosen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6843] || ||<br />
|-<br />
| [[Reviewing_social_media_use_by_clinicians]] || [[Integrating the hospital library with patient care, teaching and research: model and Web 2.0 tools to create a social and collaborative community of clinical research in a hospital setting.]] [http://www.ncbi.nlm.nih.gov/pubmed/20712716?dopt=Abstract] || ||<br />
|-<br />
| [[Agarwal_et_al,_ICWSM_2009#Related_Works_and_Papers]] || [[Latent Friend Mining from Blog Data, ICDM 2006]] [http://dl.acm.org/citation.cfm?id=1193350] || ||<br />
|-<br />
| [[Hassan_et_al,_ICWSM_2009]] || [[Document representation and query expansion models for blog recommendation]] [http://www.cs.cmu.edu/~jaime/ArguelloICWSM08.pdf] || ||<br />
|-<br />
| [[E.A._Leicht,_Structure_of_Time_Evo_citation_networks_2007]] || [[Detecting Topic Evolution in Scientific Literature: How Can Citations Help?]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Fclgiles.ist.psu.edu%2Fpubs%2FCIKM2009-topic-evolution-citations.pdf&ei=YXCHUOnjDOXH0AHj74HoDQ&usg=AFQjCNHzvuuex1dNYKsuFzbxPIR3y45V5A&sig2=fP-6FsO1Pq4ewLToESR42g] || ziy ||<br />
|-<br />
| [[Birke%26Sarkar,FigLanguages07]] || [[A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006]] [http://acl.ldc.upenn.edu/E/E06/E06-1042.pdf] || tinghaoh ||<br />
|-<br />
| [[A_Discriminative_Latent_Variable_Model_for_SMT]] || [[An End-to-End Discriminative Approach to Machine Translation]] [http://www.seas.upenn.edu/~taskar/pubs/acl06.pdf] || ||<br />
|-<br />
| [[Davidov_et_al_COLING_10]] || [[Structured Models for Fine-to-Coarse Sentiment Analysis]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334] ||ydalal ||<br />
|-<br />
| [[Anderson_et_al_KDD2012]] || [[Predicting web searcher satisfaction with existing community-based answers]] [http://www.cs.cmu.edu/~dpelleg/download/sigir311-liu.pdf] || ||<br />
|-<br />
| [[Leskovec_et_al.,_WWW_2010]] || [[Statistical properties of community structure in large social and information networks. In WWW ’08]] [http://cs-www.cs.yale.edu/homes/mmahoney/pubs/Communities_WWW.pdf] || ||<br />
|-<br />
| [[Accurate_Unlexicalized_Parsing]] || [[Learning Accurate, Compact, and Interpretable Tree Annotation, S. Petrov, L. Barrett, R. Thibaux, D. Klein, ACL 2006]] [http://acl.ldc.upenn.edu/P/P06/P06-1055.pdf] || ||<br />
|-<br />
| [[Esuli_and_Sebastiani_LREC_2006]] || [[Determining term subjectivity and term orientation for opinion mining.]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_1_esulisebastiani_192.pdf&ei=v3KHUOfBMObs0gHT5IHIBg&usg=AFQjCNGk9-BW40FOkzPKLtVyb8a7Dv4XbQ&sig2=81oi1NYAxXR3ZHeoHnJMJA] || ytsvetko ||<br />
|-<br />
| [[Gilbert_et_al.,_ICWSM_2010]] || [[A Sentiment Detection Engine for Internet Stock Message Boards]] [http://aclweb.org/anthology-new/U/U09/U09-1012.pdf] || ||<br />
|-<br />
| [[Akcora_et_al,_SOMA_2010]] || [[L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006]] [http://www.aaai.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-020.pdf] || zhouyu||<br />
|-<br />
| [[Chambers_and_Jurafsky,_Unsupervised_Learning_of_Narrative_Event_Chains,_ACL_2008]] || [[Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations]] [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Chklovski.pdf] || ||<br />
|-<br />
| [[BinLu_et_al._ACL2011]] || [[Learning Multilingual Subjective Language via Cross-Lingual Projections]] [http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf] || lingpenk ||<br />
|-<br />
| [[Domain-Assisted_Product_Aspect_Hierarchy_Generation:_Towards_Hierarchical_Organization_of_Unstructured_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1583583] || ||<br />
|-<br />
| [[A_Latent_Variable_Model_for_Geographic_Lexical_Variation]] || [[Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW]] [http://dl.acm.org/citation.cfm?id=1135857] || ||<br />
|-<br />
| [[Collier_et_al._Journal_of_Biomedical_Semantics_2011]] || [[Modeling Spread of Disease from Social Interaction]] [http://www.cs.rochester.edu/u/kautz/papers/Sadilek-Kautz-Silenzio_Modeling-Spread-of-Disease-from-Social-Interactions_ICWSM-12.pdf] ||rajarshd ||<br />
|-<br />
| [[Capturing_Global_Mood_Levels_using_Blog_Posts]] || [[Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena]] [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2826/3237] || ||<br />
|-<br />
| [[Andreevskaia_et_al.,_ICWSM_2007]] || [[M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection.]] [http://suraj.lums.edu.pk/~cs631s05/Papers/retrieving%20topical%20sentiments%20from%20online%20document%20collection.pdf] || ||<br />
|-<br />
| [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011]] || [[ Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]] [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf] || ysim ||<br />
|}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=ToWikify&diff=14784ToWikify2012-10-26T21:20:22Z<p>Ysim: </p>
<hr />
<div>{| class="wikitable sortable" border="1" cellpadding="4" cellspacing="0"<br />
|-<br />
! Paper !! Related Paper !! Student Andrew ID !! Link to Comparison<br />
|-<br />
| [[Yang_et_al_Modeling_Information_Diffusion_in_Implicit_Networks]] || [[Inferring the Diffusion and Evolution of Topics in Social Communities]] [http://www.cs.uiuc.edu/homes/hanj/pdf/snakdd11_clin.pdf] || ||<br />
|-<br />
| [[Zheleva_ACM_2009]] || [[Geographic routing in social networks]] [http://www.pnas.org/content/102/33/11623] || ||<br />
|-<br />
| [[Y._Borghol_et_al._Performance_Evaluation_68_2011]] || [[The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity]] [http://www.ida.liu.se/~nikca/papers/kdd12.pdf] || ||<br />
|-<br />
| [[Zheleva_and_Getoor,_WWW2009]] || [[Preserving the privacy of sensitive relationships in graph data. PinKDD, 2007]] [http://www.springerlink.com/content/n1404m0668452854/] || ||<br />
|-<br />
| [[Vladimir_Ouzienko,_Prediction_of_Attributes_and_Links_in_Temporal_Social_Networks]] || [[Introduction to stochastic actor-based models for network dynamics]] [http://www.sciencedirect.com/science/article/pii/S0378873309000069] || ||<br />
|-<br />
| [[Miller_et_al_ICWSM_2011]] || [[Can predicate-argument structures be used for contextual opinion retrieval from blogs?]] [http://rd.springer.com/article/10.1007/s11280-012-0170-8] || ||<br />
|-<br />
| [[Ritter_et_al,_EMNLP_2011._Named_Entity_Recognition_in_Tweets:_An_Experimental_Study]] || [[Event discovery in social media feeds]] [http://people.csail.mit.edu/regina/my_papers/twitter_acl2011.pdf] || ||<br />
|-<br />
| [[Ritter_et_al_NAACL_2010._Unsupervised_Modeling_of_Twitter_Conversations]] || [[Catching the drift: Probabilistic content models, with applications to generation and summarization]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDAQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Fhlt-naacl2004%2Fmain%2Fpdf%2F167_Paper.pdf&ei=z_iHUMj6MavI0AHbm4HYDg&usg=AFQjCNFvkmshrGjFbst0izxL_4fR6chdiA&sig2=KBi5EDzmBxrd3sTzm5Qyhg] || ||<br />
|-<br />
| [[Modeling_Contagion_Through_Facebook_News_Feed]] || [[Cascading Behavior in Large Blog Graphs]] [http://cs.stanford.edu/~jure/pubs/blogs-sdm07.pdf] ||thoang ||<br />
|-<br />
| [[Yeh_et_al_WikiWalk_Random_walks_on_Wikipedia_for_Semantic_Relatedness]] || [[Personalizing PageRank for Word Sense Disambiguation]] [http://www.aclweb.org/anthology/E/E09/E09-1005.pdf] || nnori ||<br />
|-<br />
| [[Yano_et_al_NAACL_2009]] || [[Link-PLSA-LDA: A new unsupervised model for topics and influence of blogs ]] [http://www-poleia.lip6.fr/~gallinar/Enseignement/2009-Papiers-ARI/icwsm2008-nalapati.pdf] || ||<br />
|-<br />
| [[Ramage_et_al_ICWSM_2010]] || [[ Is it Really About Me? Message Content in Social Awareness Streams]] [http://dl.acm.org/citation.cfm?id=1718953] || ||<br />
|-<br />
| [[Rodriguez_et_al_Oct_2011]] || [[ The origin of bursts and heavy tails in human dynamics]] [http://nd.edu/~networks/HumanDynamics_20Oct05/HumanDynamics_Nature207,435(2005).pdf] || ||<br />
|-<br />
| [[Measuring_User_Influence_in_Twitter:_The_Million_Follower_Fallacy]] || [[Influentials, Networks, and Public Opinion Formation]] [ftp://intranet.dei.polimi.it/outgoing/Carlo.Piccardi/VarieDsc/Wa07.pdf] || ||<br />
|-<br />
| [[OConnor_et._al.,_ICWSM_2010]] || [[Widespread Worry and the Stock Market]] [http://social.cs.uiuc.edu/people/gilbert/pub/icwsm10.worry.gilbert.pdf] || Gmontane ||<br />
|-<br />
| [[Link_propagation:_A_fast_semi-supervised_learning_algorithm_for_link_prediction]] || [[Fast and scalable algorithms for semi-supervised link prediction on static and dynamic graphs]] [http://www.springerlink.com/content/g622186787k4258r/] || ||<br />
|-<br />
| [[Mrinmaya_et._al._WWW%2712]] || [[The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email]] [http://people.cs.umass.edu/~mccallum/papers/art04tr.pdf] || ||<br />
|-<br />
| [[Yano_et_al_ICWSM_2010._What’s_Worthy_of_Comment%3F_Content_and_Comment_Volume_in_Political_Blogs]] || [[Mixed membership models of scientific publication]] [http://www.cs.cmu.edu/~lafferty/pub/efl.pdf] || ||<br />
|-<br />
| [[Rosen-Zvi_et_al,_The_Author-Topic_Model_for_Authors_and_Documents]] || [[The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.6843] || ||<br />
|-<br />
| [[Reviewing_social_media_use_by_clinicians]] || [[Integrating the hospital library with patient care, teaching and research: model and Web 2.0 tools to create a social and collaborative community of clinical research in a hospital setting.]] [http://www.ncbi.nlm.nih.gov/pubmed/20712716?dopt=Abstract] || ||<br />
|-<br />
| [[Agarwal_et_al,_ICWSM_2009#Related_Works_and_Papers]] || [[Latent Friend Mining from Blog Data, ICDM 2006]] [http://dl.acm.org/citation.cfm?id=1193350] || ||<br />
|-<br />
| [[Hassan_et_al,_ICWSM_2009]] || [[Document representation and query expansion models for blog recommendation]] [http://www.cs.cmu.edu/~jaime/ArguelloICWSM08.pdf] || ||<br />
|-<br />
| [[E.A._Leicht,_Structure_of_Time_Evo_citation_networks_2007]] || [[Detecting Topic Evolution in Scientific Literature: How Can Citations Help?]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Fclgiles.ist.psu.edu%2Fpubs%2FCIKM2009-topic-evolution-citations.pdf&ei=YXCHUOnjDOXH0AHj74HoDQ&usg=AFQjCNHzvuuex1dNYKsuFzbxPIR3y45V5A&sig2=fP-6FsO1Pq4ewLToESR42g] || ziy ||<br />
|-<br />
| [[Birke%26Sarkar,FigLanguages07]] || [[A Clustering Approach for the Nearly Unsupervised Recognition of Nonliteral Language, EACL-2006]] [http://acl.ldc.upenn.edu/E/E06/E06-1042.pdf] || tinghaoh ||<br />
|-<br />
| [[A_Discriminative_Latent_Variable_Model_for_SMT]] || [[An End-to-End Discriminative Approach to Machine Translation]] [http://www.seas.upenn.edu/~taskar/pubs/acl06.pdf] || ||<br />
|-<br />
| [[Davidov_et_al_COLING_10]] || [[Structured Models for Fine-to-Coarse Sentiment Analysis]] [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.116.5334] ||ydalal ||<br />
|-<br />
| [[Anderson_et_al_KDD2012]] || [[Predicting web searcher satisfaction with existing community-based answers]] [http://www.cs.cmu.edu/~dpelleg/download/sigir311-liu.pdf] || ||<br />
|-<br />
| [[Leskovec_et_al.,_WWW_2010]] || [[Statistical properties of community structure in large social and information networks. In WWW ’08]] [http://cs-www.cs.yale.edu/homes/mmahoney/pubs/Communities_WWW.pdf] || ||<br />
|-<br />
| [[Accurate_Unlexicalized_Parsing]] || [[Learning Accurate, Compact, and Interpretable Tree Annotation, S. Petrov, L. Barrett, R. Thibaux, D. Klein, ACL 2006]] [http://acl.ldc.upenn.edu/P/P06/P06-1055.pdf] || ||<br />
|-<br />
| [[Esuli_and_Sebastiani_LREC_2006]] || [[Determining term subjectivity and term orientation for opinion mining.]] [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CCEQFjAA&url=http%3A%2F%2Facl.ldc.upenn.edu%2Feacl2006%2Fmain%2Fpapers%2F13_1_esulisebastiani_192.pdf&ei=v3KHUOfBMObs0gHT5IHIBg&usg=AFQjCNGk9-BW40FOkzPKLtVyb8a7Dv4XbQ&sig2=81oi1NYAxXR3ZHeoHnJMJA] || ytsvetko ||<br />
|-<br />
| [[Gilbert_et_al.,_ICWSM_2010]] || [[A Sentiment Detection Engine for Internet Stock Message Boards]] [http://aclweb.org/anthology-new/U/U09/U09-1012.pdf] || ||<br />
|-<br />
| [[Akcora_et_al,_SOMA_2010]] || [[L. Ku, Y. Liang, and H. Chen. Opinion extraction, summarization and tracking in news and blog corpora. In Proceedings of AAAI-2006]] [http://www.aaai.org/Papers/Symposia/Spring/2006/SS-06-03/SS06-03-020.pdf] || zhouyu||<br />
|-<br />
| [[Chambers_and_Jurafsky,_Unsupervised_Learning_of_Narrative_Event_Chains,_ACL_2008]] || [[Chklovski and Pantel (2004) Verbocean:Mining the web for fine-grained semantic verb relations]] [http://acl.ldc.upenn.edu/acl2004/emnlp/pdf/Chklovski.pdf] || ||<br />
|-<br />
| [[BinLu_et_al._ACL2011]] || [[Learning Multilingual Subjective Language via Cross-Lingual Projections]] [http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf] || lingpenk ||<br />
|-<br />
| [[Domain-Assisted_Product_Aspect_Hierarchy_Generation:_Towards_Hierarchical_Organization_of_Unstructured_Consumer_Reviews]] || [[Learning object models from semistructured Web documents]] [http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1583583&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D1583583] || ||<br />
|-<br />
| [[A_Latent_Variable_Model_for_Geographic_Lexical_Variation]] || [[Q. Mei, C. Liu, H. Su, and C. X Zhai. 2006. A probabilistic approach to spatiotemporal theme pattern mining on weblogs. In Proceedings of WWW]] [http://dl.acm.org/citation.cfm?id=1135857] || ||<br />
|-<br />
| [[Collier_et_al._Journal_of_Biomedical_Semantics_2011]] || [[Modeling Spread of Disease from Social Interaction]] [http://www.cs.rochester.edu/u/kautz/papers/Sadilek-Kautz-Silenzio_Modeling-Spread-of-Disease-from-Social-Interactions_ICWSM-12.pdf] ||rajarshd ||<br />
|-<br />
| [[Capturing_Global_Mood_Levels_using_Blog_Posts]] || [[Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena]] [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/viewFile/2826/3237] || ||<br />
|-<br />
| [[Andreevskaia_et_al.,_ICWSM_2007]] || [[M. Hurst and K. Nigam. Retrieving topical sentiments from online document collection.]] [http://suraj.lums.edu.pk/~cs631s05/Papers/retrieving%20topical%20sentiments%20from%20online%20document%20collection.pdf] || ||<br />
|-<br />
| [[Das_Sarma_et._al.,_Dynamic_Relationship_and_Event_Discovery,_WSDM_2011]] || [[ Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI, 2007]] [http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Temporal%20and%20Information%20Flow%20Based.pdf] |ysim| ||<br />
|}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14685Controversial events detection2012-10-16T06:12:53Z<p>Ysim: /* Data and evaluation */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
=== Logistic normal prior for events ===<br />
<br />
Using a logistic normal prior for events will allow us to incorporate features (such as Twitter hashtags, blog posts titles, comments count, etc) in a principled manner. Logistic normal priors have been used in [http://www.cs.princeton.edu/~mimno/papers/sampledlgstnorm.pdf here] and [http://delivery.acm.org/10.1145/1630000/1620766/p74-cohen.pdf here]<br />
<br />
== Data and evaluation ==<br />
<br />
We intend to experiment with two different sets of data:<br />
# Set of tweets collected over 12 weekends (Sep-Dec 2011)<br />
# Posts and comments from political blogs (relating to the presidential elections) in the year 2012<br />
<br />
Over the 12 weekends from Sep-Dec, there are football games played every Sunday evenings. <br />
Football games present an obvious way for us to evaluate the performance of our model.<br />
Each of these games qualify as an event with a known time of occurrence.<br />
Additionally, we also know that there are at least two factions associated with each game (one set of fans for each team).<br />
One way of identifying factions would be to manually inspect the word vectors associated with the factions, identifying the teams that they are supporting.<br />
Another option is to leverage on the location metadata associated with each tweet. <br />
To identify factions with fans bases, we will compute the mean location (expressed as latitude and longitude) for each faction as the weighted average of words that draw from that faction, and then associate it with the geographically closest NFL market (in terms of great-circle distance).<br />
<br />
Also, significant events that have occurred during this period are 9/11 anniversary, Halloween, thanksgiving and Christmas.<br />
These events should have low entropy in the faction distribution of words within a document, which will serve as a reference for evaluating our model in terms of its ability to identify factions.<br />
<br />
Blog posts provide substantially more content per document.<br />
Since this is an election year, hope to use data scraped from political blogs to qualitatively evaluate our model in its ability to pick up key election year events (like debates, primaries, conventions, Todd Akin-like controversial remarks, etc).<br />
Also, politics is one of the most contentious subject with much discussions and debates, which we hope our model will be able to learn the factions from.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14684Controversial events detection2012-10-16T06:07:12Z<p>Ysim: /* Data */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
=== Logistic normal prior for events ===<br />
<br />
Using a logistic normal prior for events will allow us to incorporate features (such as Twitter hashtags, blog posts titles, comments count, etc) in a principled manner. Logistic normal priors have been used in [http://www.cs.princeton.edu/~mimno/papers/sampledlgstnorm.pdf here] and [http://delivery.acm.org/10.1145/1630000/1620766/p74-cohen.pdf here]<br />
<br />
== Data and evaluation ==<br />
<br />
We intend to experiment with two different sets of data:<br />
# Set of tweets collected over 12 weekends (Sep-Dec 2011)<br />
# Posts and comments from political blogs (relating to the presidential elections) in the year 2012<br />
<br />
Over the 12 weekends from Sep-Dec, there are football games played every Sunday evenings. <br />
Football games present an obvious way for us to evaluate the performance of our model.<br />
Each of these games qualify as an event with a known time of occurrence.<br />
Additionally, we also know that there are at least two factions associated with each game (one set of fans for each team).<br />
One way of identifying factions would be to manually inspect the word vectors associated with the factions, identifying the teams that they are supporting.<br />
Another option is to leverage on the location metadata associated with each tweet. <br />
To identify factions with fans bases, we will compute the mean location (expressed as latitude and longitude) for each faction as the weighted average of words that draw from that faction, and then associate it with the geographically closest NFL market (in terms of great-circle distance).<br />
Also, significant events that have occurred during this period are 9/11 anniversary, Halloween, thanksgiving and Christmas.<br />
<br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14683Controversial events detection2012-10-16T05:54:00Z<p>Ysim: /* Logistic normal prior for events */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
=== Logistic normal prior for events ===<br />
<br />
Using a logistic normal prior for events will allow us to incorporate features (such as Twitter hashtags, blog posts titles, comments count, etc) in a principled manner. Logistic normal priors have been used in [http://www.cs.princeton.edu/~mimno/papers/sampledlgstnorm.pdf here] and [http://delivery.acm.org/10.1145/1630000/1620766/p74-cohen.pdf here]<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14682Controversial events detection2012-10-16T05:52:58Z<p>Ysim: /* Logistic normal prior for events */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
=== Logistic normal prior for events ===<br />
<br />
Using a logistic normal prior for events will allow us to incorporate features (such as Twitter hashtags, blog posts titles, comments count, etc) in a principled manner. Logistic normal priors have been used in [http://www.cs.princeton.edu/~mimno/papers/sampledlgstnorm.pdf here]<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14681Controversial events detection2012-10-16T05:52:48Z<p>Ysim: /* Logistic normal prior for events */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
=== Logistic normal prior for events ===<br />
<br />
Using a logistic normal prior for events will allow us to incorporate features (such as Twitter hashtags, blog posts titles, comments count, etc) in a principled manner. Logistic normal priors have been used in [http://www.cs.princeton.edu/~mimno/papers/sampledlgstnorm.pdf | here]<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14680Controversial events detection2012-10-16T05:52:23Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
=== Logistic normal prior for events ===<br />
<br />
Using a logistic normal prior for events will allow us to incorporate features (such as Twitter hashtags, blog posts titles, comments count, etc) in a principled manner. Logistic normal priors have been used in [http://www.cs.princeton.edu/~mimno/papers/sampledlgstnorm.pdf|here]<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14678Controversial events detection2012-10-16T05:47:04Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
''A graphical plate diagram of our model will be up soon.''<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14677Controversial events detection2012-10-16T05:43:53Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a [[Sparse_Additive_Generative_Models_of_Text|sparse additive generative (SAGE)]] model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14676Controversial events detection2012-10-16T05:43:04Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, \eta^{e,f}, \eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
=== SAGE language model ===<br />
<br />
To model the different effects of events and factions, we use a sparse additive generative (SAGE) model.<br />
In contrast to the popular Dirichlet-multinomial for topic modeling, which directly models lexical probabilities associated with each (latent) topic, SAGE models the deviation in log frequencies from a background lexical distribution.<br />
Applying a sparsity inducing prior on the topic term vectors limits the number of terms whose frequencies diverge from the background lexical frequencies, thereby increasing robustness to limited training data.<br />
Also, in the case of our model, it eliminates the need for a switching variable to choose between event words and faction words.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14675Controversial events detection2012-10-16T05:40:51Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
<math>\eta^e, eta^{e,f}, eta^m</math> - SAGE vectors, which are log additive weights for each word in the vocabulary. We have one for each event, each combination of event and faction, and a background word distribution.<br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{e_{di},f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14674Controversial events detection2012-10-16T05:39:01Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di} \mid e_{di}, f_{di}, \mathbf{\eta} ) \propto \exp(\eta^{e_{di}}_w + \eta^{f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14673Controversial events detection2012-10-16T05:38:36Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>p(w_{di}\mid e_{di}, f_{di}, \bm\eta ) \propto \exp(\eta^{e_{di}}_w + \eta^{f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14672Controversial events detection2012-10-16T05:37:40Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
=== Notation ===<br />
<br />
<math>E</math> - fixed number of events<br />
<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
=== Generative story ===<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>\propto \exp(\eta^{e_{di}}_w + \eta^{f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14671Controversial events detection2012-10-16T05:37:04Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior <math>\alpha</math> (this prior could be Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>\propto \exp(\eta^{e_{di}}_w + \eta^{f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
<math>E</math> - fixed number of events<br />
<math>\theta_d</math> - multinomial distribution of events specific to document <math>d</math><br />
<math>\phi_{e_{di}}</math> - multinomial distribution of factions specific to event <math>e_{di}</math><br />
<math>\psi_{e_{di}}</math> - the beta distribution of time specific to event <math>e_{di}</math><br />
<math>w_{di}</math> - the <math>i</math>th token in document <math>d</math><br />
<math>t_{di}</math> - timestamp associated with the <math>i</math>th token in document <math>d</math><br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14669Controversial events detection2012-10-16T05:32:35Z<p>Ysim: /* A probabilistic model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior (Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>\propto \exp(\eta^{e_{di}}_w + \eta^{f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\psi_{e_{di}}</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14668Controversial events detection2012-10-16T05:32:14Z<p>Ysim: /* Possible model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== A probabilistic model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
# Draw <math>E</math> multinomials, <math>\phi_e</math> from a Dirichlet prior, one for each event <math>e</math>. ''This is the distribution over factions for each event that we have.''<br />
# For each document <math>d</math>, draw a multinomial <math>\theta_d</math> from a prior (Dirichlet or logistic normal); then for each word <math>w_{di}</math> in the document <math>d</math>:<br />
## Draw an event <math>e_{di}</math> from multinomial <math>\theta_d</math>;<br />
## Draw a faction <math>f_{di}</math> from multinomial <math>\phi_{e_{di}}</math>;<br />
## Draw a word <math>w_{di}</math> from a SAGE language model <math>\propto \exp(\eta^{e_{di}}_w + \eta^{f_{di}}_w + \eta^m_w)</math>;<br />
## Draw a timestamp <math>t_{di}</math> from Beta <math>\ups_{e_{di}}</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14666Controversial events detection2012-10-16T05:16:25Z<p>Ysim: /* Possible model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== Possible model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
#. Draw <math>E</math> multinomials, <math>\phi_e</math> from a prior (Dirichlet or logistic normal prior), one for each event <math>e</math>.<br />
#. haha<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14665Controversial events detection2012-10-16T05:16:18Z<p>Ysim: /* Possible model */</p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== Possible model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
#. Draw <math>E</math> multinomials, <math>\phi_e</math> from a prior (Dirichlet or logistic normal prior), one for each event <math>e</math>.<br />
<br />
#. haha<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14664Controversial events detection2012-10-16T05:16:07Z<p>Ysim: </p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== Possible model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
1. Draw <math>E</math> multinomials, <math>\phi_e</math> from a prior (Dirichlet or logistic normal prior), one for each event <math>e</math>.<br />
2. haha<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14663Controversial events detection2012-10-16T05:15:55Z<p>Ysim: </p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== Possible model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task.<br />
It is a variant of a topic model, where each word is assumed to be jointly generated by an ''event'' and ''faction''.<br />
It is also similar to the topic over time model, where we generate the time stamps for each document.<br />
<br />
A generative story is as follows:<br />
<br />
Draw <math>E</math> multinomials, <math>\phi_e</math> from a prior (Dirichlet or logistic normal prior), one for each event <math>e</math>.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysimhttp://curtis.ml.cmu.edu/w/courses/index.php?title=Controversial_events_detection&diff=14662Controversial events detection2012-10-16T05:10:37Z<p>Ysim: </p>
<hr />
<div>== Comments ==<br />
<br />
This is a neat idea. The main difficulty I see here is formalizing the task precisely. What does it mean for an event to be controversial, exactly? Part of the problem is that it's not perfectly clear what an "event" is.<br />
<br />
One suggestion would be to look at a topic-modeling approach, eg [http://dl.acm.org/citation.cfm?id=1150450 topics over time], to find topics with a short temporal span in social-media data. You might be able to combine this with sentiment around those topics in two different communities - eg using something like my [http://www.cs.cmu.edu/~wcohen/postscript/icwsm-2012.pdf MCR-LDA model]. So one way to flesh out this idea would be to start with two topic models:<br />
<br />
* MCR-LDA, to measure 'controversy' - you might be able to get predictions from Ramnath on his blog data, if the code's not ready to distribute yet. I would not completely commit to using twitter data exclusively, btw. <br />
* TOT, to detect shortlived 'events' vs long-term topics.<br />
<br />
Then write some inference code to combine the predictions and pick out "controversial events". The next stage would be working out a joint model (which you might not chose to do for the project). It's not obvious how you'd evaluate all this, however...maybe do some user labeling of final predictions like "this topic corresponds to a controversial event."<br />
<br />
These are just ideas - you might try and flesh out some other concrete idea instead. Good luck! --[[User:Wcohen|Wcohen]] 14:33, 10 October 2012 (UTC)<br />
<br />
PS. There is also a one-person team working on similar topic, you all should talk - it's [[User:Yuchen Tian]] --[[User:Wcohen|Wcohen]] 18:40, 10 October 2012 (UTC)<br />
<br />
== Team members ==<br />
<br />
* [[User:Ysim|Yanchuan Sim]]<br />
* [[User:Zhouyu|Zhou Yu]]<br />
* [[User:Tinghuiz|Tinghui Zhou]]<br />
<br />
== Project idea ==<br />
<br />
In our project, we propose to jointly detect events and the controversy surrounding it in the context of social media.<br />
For example, Christmas day is an event that receives the most attention around December 25th, while the Presidential debates once every four years.<br />
Controversy-wise, Christmas day is relatively one sided, with most of the text mentioning it being relatively homogeneous.<br />
In contrast, the Presidential debates event will have obvious sides (supporting the different candidates).<br />
<br />
Our goal is not only to detect controversial events, but also to discover what the different sides are - both grouping the individuals associated with each faction and describing how each faction talks about the event differently.<br />
<br />
We propose to use a probabilistic graphical model to achieve our goals of learning these latent structures from the data without labeled training data.<br />
<br />
== Formalizing the task ==<br />
<br />
Event - In the context of social media, an event is a period of time where there is a "surge" in the amount of interest (i.e. blog posts, tweets, comments, etc) surrounding the occurrence.<br />
<br />
We call this event controversial if given the text surrounding the event, the nature of the discussions are highly non-homogeneous (or exhibit high entropy). Each side of this event can be grouped together into a small number of distinct ''factions''.<br />
<br />
Thus, in our task, given a collection of social media documents over time, we seek to jointly infer the the events that have occurred, as well as the controversy associated with it.<br />
<br />
== Possible model ==<br />
<br />
Here's a sketch of a topic model that we are considering for our task. It is somewhat similar to the Topic over time model, except that we have latent variables over both topics (which are tied to the time) and what we call factions.<br />
<br />
== Data ==<br />
<br />
Our main data source will be Twitter, and as a start we intend to use tweets over a three month period in year 2012 (the exact date range to be decided). <br />
Some possibly controversial events that have occurred this year are the republican primaries, Grammy awards, weekly football games during the NFL season, etc.<br />
In addition to the textual content, the timestamps, locations (partially observed) and identities (of the user posting a tweet) could be useful features for our model.<br />
<br />
== Related work ==<br />
<br />
* [[RelatedPaper::Yang et al, SIGIR 98|A study on retrospective and online event detection. Yang et al, SIGIR 98]] This paper addresses the problems of detecting events in news stories. They used clustering with a vector space model to group temporally close events together.<br />
<br />
* [[RelatedPaper::Zhao et al, AAAI 07|Temporal and information flow based event detection from social text streams. Zhao et al, AAAI 07]] The authors proposes a method for detecting events from social text stream by exploiting more than just the textual content, but also exploring the temporal and social dimensions of their data.<br />
<br />
* [[RelatedPaper::Automatic_Detection_and_Classification_of_Social_Events|Automatic Detection and Classification of Social Events. Agarwal and Rambow, ACL 10]] This is one of the few works we found relating to controversial events in social media. The authors aims at detecting and classifying social events using Tree kernels.<br />
<br />
*[[RelatedPaper::Rodriguez et al. KDD 2010|Gomez Rodriguez, M., J. Leskovec, and A. Krause. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, 1019–1028]]. This paper addresses the problem of inferring underlying networks in the diffusion process of social networks, which is related to the faction discovery problem we study in this project.<br />
<br />
*[[RelatedPaper::Cosley et al 2010|Cosley, D., D. Huttenlocher, J. Kleinberg, X. Lan, and S. Suri. 2010. Sequential Influence Models in Social Networks, In Proc. 4th International Conference on Weblogs and Social<br />
Media]]. In this paper the authors study the temporal dynamics of information diffusion in social networks. The results found could give us some insights into the design of our model.<br />
<br />
* [[RelatedPaper::Castillo_2011|Information credibility on twitter. Castillo et al, WWW 11]] Discover general features in twitter about credibility assessment.<br />
<br />
* [[RelatedPaper::Guralnik_99|Event Detection from Time Series Data. Guralnik et al, KDD 99]] Develop a general approach to change-point detection that generalize across wide range of application<br />
<br />
* [[RelatedPaper:: Allan_1988|On-Line New Event Detection and Tracking. Allan et al, SIGIR 98]] An approach to detection uses a single pass clustering algorithm and a novel thresholding model that incorporates the properties of events as a major component.<br />
<br />
== Related materials ==<br />
{{#ask: [[AddressesProblem::Controversial events detection]]<br />
| ?Category<br />
| ?UsesDataset<br />
| ?UsesMethod<br />
}}</div>Ysim