http://curtis.ml.cmu.edu/w/courses/api.php?action=feedcontributions&user=PastStudents&feedformat=atomCohen Courses - User contributions [en]2024-03-28T20:27:40ZUser contributionsMediaWiki 1.33.1http://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,_ACL_2002&diff=3110Turney, ACL 20022010-12-13T14:00:16Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 417–424.<br />
<br />
== Online version ==<br />
<br />
[http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf ACL anthology]<br />
<br />
== Summary ==<br />
<br />
This is an early and influential [[Category::paper]] presenting an unsupervised approach to [[AddressesProblem::review classification]]. The basic ideas are:<br />
<br />
* To use patterns of part of speech tags to pick out phrases that are likely to be meaningful and unambiguous with respect to semantic orientation (e.g. ADJ NOUN might pick out "good service" or "delicious desserts"). <br />
<br />
* To use [[UsesMethod::pointwise mutual information]] (PMI) to score the similarity of each phrase in a review with the two words "excellent" or "poor", and give an overall score for the polarity to each phrase based on the difference of its PMI with "excellent" to the PMI with "poor". A large corpus was used here (the Web, via queries to a search engine).<br />
<br />
* To score the polarity of a review based on the total polarity of the phrases in it. <br />
<br />
== Brief description of the method ==<br />
The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula they have calculated the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are tagged as "not recommended" are usually negative.<br />
<br />
== Experimental Result ==<br />
<br />
This approach was fairly successful on a range of review-classification tasks: it achieved accuracy of between 65% and 85% in predicting an author-assigned "recommended" flag for Epinions ratings for eight diverse products, ranging from cars to movies. Many later writers used several key ideas from the paper, including: treating polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-supervised learning methods.<br />
<br />
== Related papers ==<br />
<br />
The widely cited [[RelatedPaper::Pang et al EMNLP 2002]] paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.<br />
<br />
An interesting follow-up paper is [[RelatedPaper::Turney and Littman, TOIS 2003]] which focuses on evaluation of the technique of using PMI for predicting the [[semantic orientation of words]].</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,_ACL_2002&diff=3109Turney, ACL 20022010-12-13T13:58:46Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 417–424.<br />
<br />
== Online version ==<br />
<br />
[http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf ACL anthology]<br />
<br />
== Summary ==<br />
<br />
This is an early and influential [[Category::paper]] presenting an unsupervised approach to [[AddressesProblem::review classification]]. The basic ideas are:<br />
<br />
* To use patterns of part of speech tags to pick out phrases that are likely to be meaningful and unambiguous with respect to semantic orientation (e.g. ADJ NOUN might pick out "good service" or "delicious desserts"). <br />
<br />
* To use [[UsesMethod::pointwise mutual information]] (PMI) to score the similarity of each phrase in a review with the two words "excellent" or "poor", and give an overall score for the polarity to each phrase based on the difference of its PMI with "excellent" to the PMI with "poor". A large corpus was used here (the Web, via queries to a search engine).<br />
<br />
* To score the polarity of a review based on the total polarity of the phrases in it. <br />
<br />
== Brief description of the method ==<br />
The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative.<br />
<br />
== Experimental Result ==<br />
<br />
This approach was fairly successful on a range of review-classification tasks: it achieved accuracy of between 65% and 85% in predicting an author-assigned "recommended" flag for Epinions ratings for eight diverse products, ranging from cars to movies. Many later writers used several key ideas from the paper, including: treating polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-supervised learning methods.<br />
<br />
== Related papers ==<br />
<br />
The widely cited [[RelatedPaper::Pang et al EMNLP 2002]] paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.<br />
<br />
An interesting follow-up paper is [[RelatedPaper::Turney and Littman, TOIS 2003]] which focuses on evaluation of the technique of using PMI for predicting the [[semantic orientation of words]].</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,_ACL_2002&diff=3108Turney, ACL 20022010-12-13T13:58:26Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P. D. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 417–424.<br />
<br />
== Online version ==<br />
<br />
[http://acl.ldc.upenn.edu/P/P02/P02-1053.pdf ACL anthology]<br />
<br />
== Summary ==<br />
<br />
This is an early and influential [[Category::paper]] presenting an unsupervised approach to [[AddressesProblem::review classification]]. The basic ideas are:<br />
<br />
* To use patterns of part of speech tags to pick out phrases that are likely to be meaningful and unambiguous with respect to semantic orientation (e.g. ADJ NOUN might pick out "good service" or "delicious desserts"). <br />
<br />
* To use [[UsesMethod::pointwise mutual information]] (PMI) to score the similarity of each phrase in a review with the two words "excellent" or "poor", and give an overall score for the polarity to each phrase based on the difference of its PMI with "excellent" to the PMI with "poor". A large corpus was used here (the Web, via queries to a search engine).<br />
<br />
* To score the polarity of a review based on the total polarity of the phrases in it. <br />
<br />
== Brief description of the method ==<br />
The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative.<br />
<br />
<br />
<br />
== Experimental Result ==<br />
<br />
This approach was fairly successful on a range of review-classification tasks: it achieved accuracy of between 65% and 85% in predicting an author-assigned "recommended" flag for Epinions ratings for eight diverse products, ranging from cars to movies. Many later writers used several key ideas from the paper, including: treating polarity prediction as a document-classification problem; classifying documents based on likely-to-be-informative phrases; and using unsupervised or semi-supervised learning methods.<br />
<br />
== Related papers ==<br />
<br />
The widely cited [[RelatedPaper::Pang et al EMNLP 2002]] paper was influenced by this paper - but considers supervised learning techniques. The choice of movie reviews as the domain was suggested by the (relatively) poor performance of Turney's method on movies.<br />
<br />
An interesting follow-up paper is [[RelatedPaper::Turney and Littman, TOIS 2003]] which focuses on evaluation of the technique of using PMI for predicting the [[semantic orientation of words]].</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3103Turney,20022010-12-02T16:16:39Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and the word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as an input. First it assigns a POS tag to each word in the review to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that the value of average semantic orientation for phrases in the items that are tagged as "recommended" by the users are usually positive and those that are classified as "not recommended" are usually negative.<br />
<br />
== Evaluation Results ==<br />
To evaluate their technique they have chosen 410 reviews from Epinions. The accuracy of a classifier that guesses the majority class is 59% while PMI-IR technique achieves 75% accuracy.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3102Opinion mining2010-12-02T16:10:23Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[relatedPaper::Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of all the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency in uni-gram, bi-gram and tri-gram. <br />
<br />
<br />
* Feature level<br />
** [[Zhuang et al., 2006]] introduced a novel technique to classify movie reviews by extracting high frequency feature keywords.<br />
<br />
** [[Liu, 2004]] uses a statistical rule-based approach to extract high frequency feature words.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3101Opinion mining2010-12-02T16:09:22Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[relatedPaper::Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency on uni-gram, bi-gram and tri-gram. <br />
<br />
<br />
* Feature level<br />
** [[Zhuang et al., 2006]] introduced a novel technique to classify movie reviews by extracting high frequency feature keywords.<br />
<br />
** [[Liu, 2004]] uses a statistical rule-based approach to extract high frequency feature words.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Jin_et_al,_2009&diff=3100Jin et al, 20092010-12-02T16:03:30Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Jin, W., Ho, H.,Srihari, R., 2009, OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction, KDD'09<br />
<br />
== Online version ==<br />
<br />
[[http://portal.acm.org/citation.cfm?id=1557148|Oponion_Mining]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] introduces a system that mines customer reviews of a product and extracts product features from the review. The system returns opinion expression that are extracted from product review as well as opinion direction. [[AddressesProblem:: Opinion mining]] have been studied widely in machine learning and information extraction community. Most of these approaches have used statistical or rule-based learning to extract opinion expression. Jin et al. in this work have introduced a new technique that uses lexicalized HMM for opinion mining. <br />
<br />
== System Architecture ==<br />
The architecture of their system is as follow: <br />
<br />
- Pre-processing: The system first crawls web pages from the Web. It cleans the HTML files that are crawled from the Web and segments all the sentences. The technique that is used to extract reviews of a product from the input web pages is not described in the paper. For the next parts of the architecture they have assumed that the reviews are extracted from the webpage and are given to the system.<br />
<br />
- Entity types and tag sets: They have defined four entity types for each product review: components (e.g. physical object of a camera), functions (e.g. zoom in a camera), features (e.g. color), and opinions (e.g. ideas and thoughts). For each if these types they have defined a set of tags that are used in annotation process. <br />
<br />
- Lexicalized HMMs: Given a review of a product as an input of the system, the goal of the lexicalized HMM is to assign appropriate tag type to each part of the product review. For classification, they maximize conditional probability <math> P(T|W,S) </math> where T is the tags that should be assigned to different parts of a product review, <math> W </math> is a set of all the words in the review and <math> S </math> is the POS tag for each word. They have used MLE to learn parameters of the system. <br />
<br />
- Information propagation: The goal of this part is to decrease the number of training data that this system requires. Suppose that we have a sentence like "Good picture quality" as part of a review in the training data. Word "good" is tagged as "<opinion_pos_exp>" in the training data. The system then creates new training data by looking at a dictionary and substitute word "good" with it's synonyms. For example the new sentence "great picture quality" can be added as a new training data. This idea is applied to all the words in the training data to increase the number of examples.<br />
<br />
- Bootstrapping: The main contribution of this paper is the bootstrapping part. The idea is to partition the training set to two different disjoint sets and train a HMM using each of these sets. Then for each instance of the test data (which is non annotated by the human), if two HMMs classify the input review as the same class and if the confidence value is above a threshold T then this new instance is added to the training set. This idea can significantly decrease the amount of time that human should spend to annotate training data.<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews of different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that their system can increase accuracy of mining opinion expression by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3099Opinion mining2010-12-01T20:02:27Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[relatedPaper::Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency on uni-gram, bi-gram and tri-gram. <br />
<br />
<br />
* Feature level<br />
** [[Zhuang et al., 2006]] introduced a novel technique to classify movie reviews by extracting high frequency feature keywords.<br />
<br />
** [[Liu, 2004]] uses a statistical rule-based approach to extract high frequency feature words.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3098Opinion mining2010-12-01T19:46:24Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[relatedPaper::Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency on uni-gram, bi-gram and tri-gram. <br />
<br />
* Feature level<br />
** <br />
<br />
Some common models for named entity recognition include the following:<br />
* '''Lexicons'''<br />
** Checks if a token is part of a predefined set<br />
* '''Classifying pre-segmented candidates'''<br />
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is<br />
* '''Sliding Window'''<br />
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold<br />
* '''Token Tagging / Sequential'''<br />
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].<br />
<br />
== Example Systems ==<br />
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]<br />
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]<br />
<br />
== References / Links ==<br />
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]<br />
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]<br />
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]<br />
<br />
== Relevant Papers ==<br />
<br />
{{?relatedPaper<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3097Opinion mining2010-12-01T19:45:50Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[relatedPaper::Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency on uni-gram, bi-gram and tri-gram. <br />
<br />
* Feature level<br />
** <br />
<br />
Some common models for named entity recognition include the following:<br />
* '''Lexicons'''<br />
** Checks if a token is part of a predefined set<br />
* '''Classifying pre-segmented candidates'''<br />
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is<br />
* '''Sliding Window'''<br />
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold<br />
* '''Token Tagging / Sequential'''<br />
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].<br />
<br />
== Example Systems ==<br />
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]<br />
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]<br />
<br />
== References / Links ==<br />
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]<br />
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]<br />
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]<br />
<br />
== Relevant Papers ==<br />
<br />
{{#ask: [[AddressesProblem::Named Entity Recognition]]<br />
| ?UsesMethod<br />
| ?UsesDataset<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3096Opinion mining2010-12-01T19:44:56Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[Turney,2002]] [[category::paper]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency on uni-gram, bi-gram and tri-gram. <br />
<br />
* Feature level<br />
** <br />
<br />
Some common models for named entity recognition include the following:<br />
* '''Lexicons'''<br />
** Checks if a token is part of a predefined set<br />
* '''Classifying pre-segmented candidates'''<br />
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is<br />
* '''Sliding Window'''<br />
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold<br />
* '''Token Tagging / Sequential'''<br />
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].<br />
<br />
== Example Systems ==<br />
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]<br />
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]<br />
<br />
== References / Links ==<br />
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]<br />
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]<br />
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]<br />
<br />
== Relevant Papers ==<br />
<br />
{{#ask: [[AddressesProblem::Named Entity Recognition]]<br />
| ?UsesMethod<br />
| ?UsesDataset<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3095Opinion mining2010-12-01T19:43:26Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. They have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** [[Dave et al.,2003]] introduced a novel approach to classify reviews in Amazon.com using normalized term frequency on uni-gram, bi-gram and tri-gram. <br />
<br />
* Feature level<br />
** <br />
<br />
Some common models for named entity recognition include the following:<br />
* '''Lexicons'''<br />
** Checks if a token is part of a predefined set<br />
* '''Classifying pre-segmented candidates'''<br />
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is<br />
* '''Sliding Window'''<br />
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold<br />
* '''Token Tagging / Sequential'''<br />
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].<br />
<br />
== Example Systems ==<br />
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]<br />
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]<br />
<br />
== References / Links ==<br />
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]<br />
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]<br />
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]<br />
<br />
== Relevant Papers ==<br />
<br />
{{#ask: [[AddressesProblem::Named Entity Recognition]]<br />
| ?UsesMethod<br />
| ?UsesDataset<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3094Opinion mining2010-12-01T19:34:30Z<p>PastStudents: </p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
== Common Approaches ==<br />
<br />
Generally there are two approaches for opinion mining: 1- document level and 2- feature level opinion mining. <br />
<br />
* Document level<br />
** [[Turney,2002]] presented an approach to calculate the opinion orientation using the Web as a corpus. The input review is classified based on the average semantic orientation of the phrases in the review. The have used PMI-IR technique to measure the semantic orientation of each phrase in the review.<br />
<br />
** [[Turney and Littman, 2003]] expanded [[Turney,2002]] work using cosine distance in latent semantic analysis as the distance measure.<br />
<br />
** <br />
<br />
Some common models for named entity recognition include the following:<br />
* '''Lexicons'''<br />
** Checks if a token is part of a predefined set<br />
* '''Classifying pre-segmented candidates'''<br />
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is<br />
* '''Sliding Window'''<br />
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold<br />
* '''Token Tagging / Sequential'''<br />
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].<br />
<br />
== Example Systems ==<br />
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]<br />
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]<br />
<br />
== References / Links ==<br />
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]<br />
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]<br />
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]<br />
<br />
== Relevant Papers ==<br />
<br />
{{#ask: [[AddressesProblem::Named Entity Recognition]]<br />
| ?UsesMethod<br />
| ?UsesDataset<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Opinion_mining&diff=3093Opinion mining2010-12-01T19:25:54Z<p>PastStudents: Created page with '== Summary == Opinion mining is a category::problem in the field of information extraction that which aims to automatically extract opinion expressions from product reviews.…'</p>
<hr />
<div>== Summary ==<br />
<br />
Opinion mining is a [[category::problem]] in the field of information extraction that which aims to automatically extract opinion expressions from product reviews. Also one of the goal of the opinion mining techniques is to determine the opinion direction of a review. <br />
<br />
Various named entity type hierarchies have been proposed in the literature, such as [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html BBN's categories] (used in Question Answering) and [http://nlp.cs.nyu.edu/ene/ Sekine's Extended Named Entity Hierarchy]<br />
<br />
== Common Approaches ==<br />
<br />
Some common models for named entity recognition include the following:<br />
* '''Lexicons'''<br />
** Checks if a token is part of a predefined set<br />
* '''Classifying pre-segmented candidates'''<br />
** Manually select candidates, then use YFCL on a piece of text to deterimine what type of entity it is<br />
* '''Sliding Window'''<br />
** Try all reasonable token windows (different lengths and positions), train a [[UsesMethod::Naive Bayes]] classifier or YFCL, then extract text if Pr(class=+|prefix, contents, suffix) > some threshold<br />
* '''Token Tagging / Sequential'''<br />
** Classify tokens sequentially, with models like [[UsesMethod::Hidden Markov Models]], [[UsesMethod::Maximum Entropy Markov Models]], or [[Uses:Method::Conditional Random Fields]].<br />
<br />
== Example Systems ==<br />
* [http://nlp.stanford.edu/ner/index.shtml Stanford NER]<br />
* [http://cogcomp.cs.illinois.edu/page/software_view/4 Illinois Named Entity Tagger]<br />
<br />
== References / Links ==<br />
* BBN Named Entity Types - [http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33/BBN-Types-Subtypes.html]<br />
* Satoshi Sekine's Extended Named Entity Hierarchy - [http://nlp.cs.nyu.edu/ene/]<br />
* Wikipedia page on Named entity recognition - [http://en.wikipedia.org/wiki/Named_entity_recognition]<br />
<br />
== Relevant Papers ==<br />
<br />
{{#ask: [[AddressesProblem::Named Entity Recognition]]<br />
| ?UsesMethod<br />
| ?UsesDataset<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3092Turney,20022010-12-01T19:18:15Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that average semantic orientation value for phrases in the items that are tagged as "recommanded" by uses are usually positive and those that are classified as "not recommanded" are usually negative.<br />
<br />
== Evaluation Results ==<br />
To evaluate their technique they have chosen 410 reviews from Epinions. The accuracy of a classifier that guesses the majority class is 59% while PMI-IR technique achieves 75% accuracy.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3091Turney,20022010-12-01T19:11:21Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
where operator NEAR means that the two phrases should be appeared close to each other in the corpus. Using the above formula we can calculate the average semantic orientation for a review. They have shown that average semantic orientation value for phrases in the items that are tagged as "recommanded" by uses are usually positive and those that are classified as "not recommanded" are usually negative.<br />
<br />
== Evaluation Results ==</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3090Turney,20022010-12-01T19:01:42Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase\ NEAR\ 'excellent')hits('excellent')}{hits(phrase\ NEAR\ 'poor')hits('excellent')} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3089Turney,20022010-12-01T19:00:50Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\ and\ w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3088Turney,20022010-12-01T19:00:19Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1 .and. w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,'excellent')-PMI(phrase,'poor')<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3087Turney,20022010-12-01T18:59:44Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1 .and. w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,\"excellent\")-PMI(phrase,\"poor\")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3086Turney,20022010-12-01T18:59:20Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1 .and. w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3085Turney,20022010-12-01T18:58:55Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1 \& w_2))<br />
</math><br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3084Turney,20022010-12-01T18:58:35Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1 & w_2))<br />
</math><br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3083Turney,20022010-12-01T18:57:35Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1 and w_2))<br />
</math><br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3082Turney,20022010-12-01T18:56:54Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(x)<br />
</math><br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3081Turney,20022010-12-01T18:56:28Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2))<br />
</math><br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3080Turney,20022010-12-01T18:55:46Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<br />
<math><br />
PMI(w_1,w_2)<br />
</math><br />
<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3079Turney,20022010-12-01T18:55:05Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1\&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3078Turney,20022010-12-01T18:52:58Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
PMI(w_1,w_2)=log_2(p(w_1&w_2)/p(w_1)p(w_2))<br />
</math><br />
<br />
where <math> p(w_1,w_2) </math> is the probability that <math> w_1 </math> and <math> w_2 </math> co-occur. They have defined the semantic orientation of a phrase as follow:<br />
<br />
<math><br />
SO(phrase)=PMI(phrase,"excellent")-PMI(phrase,"poor")<br />
</math><br />
<br />
We can modify the above definition to obtain the following formula:<br />
<br />
<math><br />
SO(phrase)=log_2(\frac{hits(phrase NEAR "excellent")hits("excellent")}{hits(phrase NEAR "poor")hits("excellent")} )<br />
</math><br />
<br />
<br />
<br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3077Turney,20022010-12-01T18:28:55Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
The algorithm takes a written review as the input. First they assign a POS tag to each word in the document to identify adjective or adverb phrases in the input review. They have used PMI-IR algorithm to estimate the semantic orientation of a phrase. The Pointwise Mutual Information (PMI) between two words <math> w_1 </math> and <math> w_2 </math> is defined as follow:<br />
<br />
<math><br />
</math><br />
<br />
Then they estimate the semantic orientation of each phrase in the document. The last step<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Turney,2002&diff=3076Turney,20022010-12-01T18:07:15Z<p>PastStudents: Created page with '== Citation == Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02 == Online version == [[http://www.ldc…'</p>
<hr />
<div>== Citation ==<br />
<br />
Turney, P., 2002, Thumbs up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, ACL'02<br />
<br />
== Online version ==<br />
<br />
[[http://www.ldc.upenn.edu/acl/P/P02/P02-1053.pdf|paper]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] presents a simple unsupervised learning algorithm for [[AddressesProblem:: Opinion mining]] problem. The system is able to classify the reviews as recommended ("thumbs up") or not-recommended ("thumbs down"). The idea is to measure the semantic orientation of phrases in a review and classify it to an appropriate class based on the average semantic orientation. The semantic orientation is measured by mutual information between the given phrase and word "excellent" minus the mutual information between the input phrase and the word "poor". <br />
<br />
== Description of the method ==<br />
<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Jin_et_al,_2009&diff=3075Jin et al, 20092010-12-01T17:52:12Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Jin, W., Ho, H.,Srihari, R., 2009, OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction, KDD'09<br />
<br />
== Online version ==<br />
<br />
[[http://portal.acm.org/citation.cfm?id=1557148|Oponion_Mining]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] introduces a system that mines customer reviews of a product and extract product features from the review. The system return opinion expression that are extracted from product review as well as opinion direction. [[AddressesProblem:: Opinion mining]] have been studied widely in machine learning and information extraction community. Most of these approaches have used statistical or rule-based learning to extract opinion expression. Jin et al. in this work have introduced a new technique that uses lexicalized HMM for opinion mining. <br />
<br />
== System Architecture ==<br />
The architecture of their system is as follow: <br />
<br />
- Pre-processing: The system first crawls web pages from the Web, clean HTML files, and segments sentences. The technique that has been used to extract reviews of a product is not described in the paper and they have assumed that the reviews are given to the input of the learning system.<br />
<br />
- Entity types and tag sets: They have defined four entity types for each product review: components (e.g. physical object of a camera), functions (e.g. zoom in a camera), features (e.g. color), and opinions (e.g. ideas and thoughts). For each if these types they have defined a set of tags that are used in annotation process. <br />
<br />
- Lexicalized HMMs: Given a review of a product as an input of the system, the goal of lexicalized HMM is to assign appropriate tag type to each part of product review. For classification they maximize conditional probability <math> P(T|W,S) </math> where T is the tags that we want to assign to different parts of product review, W is all the words in the review and S is the POS tag for each word. They have used MLE to learn parameters of the system. <br />
<br />
- Information propagation: The goal of this part is to decrease the number of training data that this system requires. Suppose that we have sentence "Good picture quality" as part of a review in the training data. Word "good" is tagged as "<opinion_pos_exp>" in the training data. The system then adds more information by looking at a dictionary and substitute word "good" with it's synonyms. This idea is applied to all the words in the training data to extend the number of examples.<br />
<br />
- Bootstrapping: The main contribution of this system is the bootstrapping part. The idea is to partition the training set to two different disjoint sets and train a HMM using each of these sets. Then for each instance of the test data (which is non annotated by the human), if two HMMs classify the input review to the same class and if the confidence value is above a threshold T then we add this new instance to the training example. This can significantly decrease the amount of time that human should spend to annotate training data.<br />
<br />
== Evaluation Results == <br />
They have tested their system on reviews different cameras that are chosen from Amazon.com. They have manually annotated reviews of 6 cameras to use as the training data. The system is tested using 4-fold validation. They have used the system that is developed by [[Turney,2002]] as the baseline for comparisons. The results have shown that they can increase accuracy of the system by a factor of 2 comparing to the baseline system.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Jin_et_al,_2009&diff=3074Jin et al, 20092010-12-01T17:11:21Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Jin, W., Ho, H.,Srihari, R., 2009, OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction, KDD'09<br />
<br />
== Online version ==<br />
<br />
[[http://portal.acm.org/citation.cfm?id=1557148|Oponion_Mining]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] introduces a system that mines customer reviews of a product and extract product features from the review. The system return opinion expression that are extracted from product review as well as opinion direction. [[AddressesProblem:: Opinion mining]] have been studied widely in machine learning and information extraction community. Most of these approaches have used statistical or rule-based learning to extract opinion expression. Jin et al. in this work have introduced a new technique that uses lexicalized HMM for opinion mining. <br />
<br />
== System Architecture ==<br />
The architecture of their system is as follow: <br />
<br />
- Pre-processing: The system first crawls web pages from the Web, clean HTML files, and segments sentences. The technique that has been used to extract reviews of a product is not described in the paper and they have assumed that the reviews are given to the input of the learning system.<br />
<br />
- Entity types and tag sets: They have defined four entity types for each product review: components (e.g. physical object of a camera), functions (e.g. zoom in a camera), features (e.g. color), and opinions (e.g. ideas and thoughts). For each if these types they have defined a set of tags that are used in annotation process. <br />
<br />
- Lexicalized HMMs: Given a review of a product as an input of the system, the goal of lexicalized HMM is to assign appropriate tag type to each part of product review. For classification they maximize conditional probability <math> P(T|W,S) </math> where T is the tags that we want to assign to different parts of product review, W is all the words in the review and S is the POS tag for each word. They have used MLE to learn parameters of the system. <br />
<br />
- Information propagation: The goal of this part is to decrease the number of training data that this system requires. Suppose that we have sentence "Good picture quality" as part of a review in the training data. Word "good" is tagged as "<opinion_pos_exp>" in the training data. The system then adds more information by looking at a dictionary and substitute word "good" with it's synonyms. This idea is applied to all the words in the training data to extend the number of examples.<br />
<br />
- Bootstrapping: The main contribution of this system is the bootstrapping part. The idea is to partition the training set to two different disjoint sets and train a HMM using each of these sets. Then for each instance of the test data (which is non annotated by the human), if two HMMs classify the input review to the same class and if the confidence value is above a threshold T then we add this new instance to the training example. This can significantly decrease the amount of time that human should spend to annotate training data.<br />
<br />
== Evaluation Results == <br />
They have tested their system</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Jin_et_al,_2009&diff=3073Jin et al, 20092010-12-01T17:07:02Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Jin, W., Ho, H.,Srihari, R., 2009, OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction, KDD'09<br />
<br />
== Online version ==<br />
<br />
[[http://portal.acm.org/citation.cfm?id=1557148|Oponion_Mining]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] introduces a system that mines customer reviews of a product and extract product features from the review. The system return opinion expression that are extracted from product review as well as opinion direction. [[AddressesProblem:: Opinion mining]] have been studied widely in machine learning and information extraction community. Most of these approaches have used statistical or rule-based learning to extract opinion expression. Jin et al. in this work have introduced a new technique that uses lexicalized HMM for opinion mining. <br />
<br />
== System Architecture ==<br />
The architecture of their system is as follow: <br />
<br />
- Pre-processing: The system first crawls web pages from the Web, clean HTML files, and segments sentences. The technique that has been used to extract reviews of a product is not described in the paper and they have assumed that the reviews are given to the input of the learning system.<br />
<br />
- Entity types and tag sets: They have defined four entity types for each product review: components (e.g. physical object of a camera), functions (e.g. zoom in a camera), features (e.g. color), and opinions (e.g. ideas and thoughts). For each if these types they have defined a set of tags that are used in annotation process. <br />
<br />
- Lexicalized HMMs: Given a review of a product as an input of the system, the goal of lexicalized HMM is to assign appropriate tag type to each part of product review. For classification they maximize conditional probability <math> P(T|W,S) </math> where T is the tags that we want to assign to different parts of product review, W is all the words in the review and S is the POS tag for each word. They have used MLE to learn parameters of the system. <br />
<br />
- Information propagation: The goal of this part is to decrease the number of training data that this system requires. Suppose that we have sentence "Good picture quality" as part of a review in the training data. Word "good" is tagged as "<opinion_pos_exp>" in the training data. The system then adds more information by looking at a dictionary and substitute word "good" with it's synonyms. This idea is applied to all the words in the training data to extend the number of examples.<br />
<br />
- Bootstrapping: The main contribution of this system is the bootstrapping part. The idea is to partition the training set to two different disjoint sets and train a HMM using each of these sets. Then for each instance of the test data (which is non annotated by the human), if two HMMs classify the input review to the same class and if the confidence value is above a threshold T then we add this new instance to the training example. <br />
<br />
<br />
The intuition behind their technique is to use global features to infer rules about the local features. For example suppose that we know the name of a set of books. Then by looking at webpages of Amazon.com and by searching the name of the books that we already have we can infer the position and font of the book title. We can then use these two features (position and font of book title in web pages) to extract new book titles from other web pages. <br />
<br />
They have described both generative and discriminative approaches for classification and extraction tasks. Global features are governed by the parameters that are shared by all the data and local features are shared only by a subset of data. For example in information extraction task, all the words in a webpage (without considering formatting) can be considered as global features. On the other hand, features such as position of a text or color of text are local features. <br />
<br />
In generative model they have modeled each document by introducing a random variable that governs local features. The parameters of the model are:<br />
<br />
- N words of documents are shown by <math> w=\{w_1,w_2,...,w_N\}</math><br />
<br />
- Formatting features are shown by <math> f=\{f_1,f_2,...,f_N\} </math><br />
<br />
- Class labels are shown by <math> c=\{c_1,c_2,...,c_N\} </math><br />
<br />
The model can be shown by the following joint distribution over local parameters, class labels, words, and formatting features:<br />
<br />
<math> p(\phi,c,w,f)=p(\phi)\prod_{i=1}^N p(c_n)p(w_n|c_n)p(f_n|c_n,\phi)</math><br />
<br />
The parameters are estimated using maximum likelihood estimation on a set of training documents. For inference, one approach is to approximate parameter <math> \phi </math> with a point estimation <math> \hat{\phi} </math> and infer the class label using MAP estimation. We can label each pair by the following formula:<br />
<br />
<math> \hat{c_n}=argmax_{c_n}p(w_n|c_n)p(f_n|c_n,\hat{\phi})p(c_n) </math> <br />
<br />
<math> \hat{\phi} </math> can be approximated by <math> \hat{\phi}=argmax_{\phi}p(\phi|f,w) </math>. They have used EM algorithm to maximize the expected log likelihood of formatting features.<br />
<br />
They have tested their method on two different datasets. The first dataset contains 1000 HTML documents. Each document is automatically divided into a set of words with similar layout characteristics and then are hand-labeled as containing or not containing a job title. The local and global features for this domain are the same as what we explained above. The second dataset contains 42,548 web pages from 330 web sites which each web page is hand-labeled as if it is a press release or not press release. The global feature is a set of word in each webpage and local feature is the URL of the webpage. Their experimental result have shown that this approach can obtain high precision and low/moderate recall.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Jin_et_al,_2009&diff=3072Jin et al, 20092010-12-01T16:40:44Z<p>PastStudents: Created page with '== Citation == Jin, W., Ho, H.,Srihari, R., 2009, OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction, KDD'09 == Online version == [[http://por…'</p>
<hr />
<div>== Citation ==<br />
<br />
Jin, W., Ho, H.,Srihari, R., 2009, OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction, KDD'09<br />
<br />
== Online version ==<br />
<br />
[[http://portal.acm.org/citation.cfm?id=1557148|Oponion_Mining]]<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] introduces a system that mines customer reviews of a product and extract product features from the review. The system return opinion expression that are extracted from product review as well as opinion direction. [[AddressesProblem:: Opinion mining]] have been studied widely in machine learning and information extraction community. Most of these approaches have used statistical or rule-based learning to extract opinion expression. Jin et al. in this work have introduced a new technique that uses lexicalized HMM for opinion mining. <br />
<br />
== System Architecture ==<br />
<br />
<br />
<br />
The intuition behind their technique is to use global features to infer rules about the local features. For example suppose that we know the name of a set of books. Then by looking at webpages of Amazon.com and by searching the name of the books that we already have we can infer the position and font of the book title. We can then use these two features (position and font of book title in web pages) to extract new book titles from other web pages. <br />
<br />
They have described both generative and discriminative approaches for classification and extraction tasks. Global features are governed by the parameters that are shared by all the data and local features are shared only by a subset of data. For example in information extraction task, all the words in a webpage (without considering formatting) can be considered as global features. On the other hand, features such as position of a text or color of text are local features. <br />
<br />
In generative model they have modeled each document by introducing a random variable that governs local features. The parameters of the model are:<br />
<br />
- N words of documents are shown by <math> w=\{w_1,w_2,...,w_N\}</math><br />
<br />
- Formatting features are shown by <math> f=\{f_1,f_2,...,f_N\} </math><br />
<br />
- Class labels are shown by <math> c=\{c_1,c_2,...,c_N\} </math><br />
<br />
The model can be shown by the following joint distribution over local parameters, class labels, words, and formatting features:<br />
<br />
<math> p(\phi,c,w,f)=p(\phi)\prod_{i=1}^N p(c_n)p(w_n|c_n)p(f_n|c_n,\phi)</math><br />
<br />
The parameters are estimated using maximum likelihood estimation on a set of training documents. For inference, one approach is to approximate parameter <math> \phi </math> with a point estimation <math> \hat{\phi} </math> and infer the class label using MAP estimation. We can label each pair by the following formula:<br />
<br />
<math> \hat{c_n}=argmax_{c_n}p(w_n|c_n)p(f_n|c_n,\hat{\phi})p(c_n) </math> <br />
<br />
<math> \hat{\phi} </math> can be approximated by <math> \hat{\phi}=argmax_{\phi}p(\phi|f,w) </math>. They have used EM algorithm to maximize the expected log likelihood of formatting features.<br />
<br />
They have tested their method on two different datasets. The first dataset contains 1000 HTML documents. Each document is automatically divided into a set of words with similar layout characteristics and then are hand-labeled as containing or not containing a job title. The local and global features for this domain are the same as what we explained above. The second dataset contains 42,548 web pages from 330 web sites which each web page is hand-labeled as if it is a press release or not press release. The global feature is a set of word in each webpage and local feature is the URL of the webpage. Their experimental result have shown that this approach can obtain high precision and low/moderate recall.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Riloff_and_Jones_1999&diff=3071Riloff and Jones 19992010-12-01T14:20:49Z<p>PastStudents: </p>
<hr />
<div>== Citation == <br />
<br />
Riloff, E. and Jones., R. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99). 1999. <br />
<br />
== Online Version ==<br />
<br />
[http://reference.kfupm.edu.sa/content/l/e/learning_dictionaries_for_information_ex_2607.pdf]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] was one of the earliest uses of [[UsesMethod::bootstrapping]] to expand entity lists given only a few seed instances. It leverages unlabeled data to iteratively find patterns around the seeds, use those patterns to find new entities, and repeat to find more patterns and entities. This exploits redundancy in the unlabeled data, using the repeated presence of patterns to infer entities and vice versa.<br />
<br />
Prior to this, most of the work in entity extraction required extensive training data to learn reliable patterns. Lexicons were hand-constructed. This work constructs both patterns and lexicons, using very limited training data, by a mutual bootstrapping procedure over unlabeled data. Candidate patterns generated by a program called AutoSlog are assessed by how many of the seed instances they extract. Top patterns are used to extract new entities, which lead to other patterns becoming highly ranked, etc.<br />
<br />
When bootstrapping data like this, there is a risk of the original concept of the list becoming distorted as the membership drifts in the wrong direction. To maintain quality of the expanding lists, two stages of filtering are used. On each iteration of the inner procedure, only the highest scoring pattern is used to infer new entities. However, all entities it is associated with will be added to the list. The new list is used to find more patterns again and again. <br />
<br />
To further improve the quality of the sets, there is an outer, meta-bootstrapping procedure. The expanded lists from the inner bootstrap are filtered to only keep the five best instances, as measured by the number of different patterns that extract those instances. These five are added to the seed set, and the entire process starts anew.<br />
<br />
The lists created were found to vary significantly depending on the domain on which they were trained. A list for vehicles, when trained on terrorism news articles, expanded to include weapons, as vehicles are often used as weapons in this area. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Hearst, COLING 1992]] similarly use patterns between entities of interest to extract facts. <br />
<br />
These iterative self-training methods show up repeatedly, such as with [[RelatedPaper::Collins and Singer, EMNLP 1999]], who use it with co-training, and [[RelatedPaper::Brin, WebDb 1998]], who uses it with relation extraction from the web.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Riloff_and_Jones_1999&diff=3070Riloff and Jones 19992010-12-01T14:18:14Z<p>PastStudents: </p>
<hr />
<div>== Citation == <br />
<br />
Riloff, E. and Jones., R. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99). 1999. <br />
<br />
== Online Version ==<br />
<br />
[http://reference.kfupm.edu.sa/content/l/e/learning_dictionaries_for_information_ex_2607.pdf]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] was one of the earliest uses of [[UsesMethod::bootstrapping]] to expand entity lists given only a few seed instances. It leverages unlabeled data to iteratively find patterns around the seeds, use those patterns to find new entities, and repeat to find more patterns and entities. This exploits redundancy in the unlabeled data, using the repeated presence of patterns to infer entities and vice versa.<br />
<br />
Prior to this, most of the work in entity extraction required extensive training data to learn reliable patterns. Lexicons were hand-constructed. This work constructs both patterns and lexicons, using very limited training data, by a mutual bootstrapping procedure over unlabeled data. Candidate patterns generated by a program called AutoSlog are assessed by how many of the seed instances they extract. Top patterns are used to extract new entities, which lead to other patterns becoming highly ranked, etc.<br />
<br />
When bootstrapping data like this, there is a risk of the original concept of the list becoming distorted as the membership drifts in the wrong direction. To maintain quality of the expanding lists, two stages of filtering are used. On each iteration of the inner procedure, only the highest scoring pattern is used to infer new entities. However, all entities it is associated with will be added to the list. The new list is used to find more patterns again and again. <br />
<br />
To further improve the quality of the sets, there is an outer, meta-bootstrapping procedure. The expanded lists from the inner bootstrap are filtered to only keep the five best instances, as measured by the number of different patterns that extract those instances. These five are added to the seed set, and the entire process starts anew.<br />
<br />
The lists created were found to vary significantly depending on the domain on which they were trained. A list for vehicles, when trained on terrorism news articles, expanded to include weapons, as vehicles are often used as weapons in this area. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Hearst, COLING 1992]] similarly use patterns between entities of interest to extract facts. <br />
<br />
These iterative self-training methods show up repeatedly, such as with [[Collins and Singer, EMNLP 1999]], who use it with co-training, and [[Brin, 1998]], who uses it with relation extraction from the web.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Talukdar_et_al_CoNLL_2006&diff=3069Talukdar et al CoNLL 20062010-12-01T12:29:37Z<p>PastStudents: </p>
<hr />
<div>== Citation == <br />
<br />
Talukdar, T., Brants, T., Liberman, M. and Pereira, F. "A Context Pattern Induction Method for Named Entity Extraction." Computational Natural Language Learning (CoNLL-X), 2006.<br />
<br />
== Online Version ==<br />
<br />
[http://acl.ldc.upenn.edu/W/W06/W06-29.pdf#page=157]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] extends previous methods for [[UsesMethod::pattern induction]] and uses the patterns to find new instances of interest, which then assist in [[AddressesProblem::named entity recognition]]. This is a form of [[UsesMethod::semi-supervised learning]], using unlabeled data to derive new features. The method is language independent, focusing on word and transition frequencies rather than chunking or parsing information. <br />
<br />
The method starts with seed instances, using them to find contexts frequently associated with the seeds. Rather than use the contexts directly, it then finds trigger words in the contexts that are rare in the corpus yet frequently found in the contexts by using IDF. These dominating words are used to define patterns later. Simply using IDF without accounting for the frequency of the word in --relevant-- contexts would lead to lower precision.<br />
<br />
The dominating words denote the start of phrases surrounding the entity of interest. These phrases are used to induce finite state automata in an effort to generalize from the phrases. The FSMs are pruned to remove transitions which have few paths using them (as opposed to which have a low weight locally on the transition).<br />
<br />
The resulting patterns from the FSMs are used to find new instances of entities to populate lists. During this process, the patterns are further filtered to encourage higher precision at the cost of recall. High quality entities from high quality patterns are added to the seed lists and the procedure then starts over.<br />
<br />
The induced lists were used as features to improve the performance of [[UsesMethod::CRF]] based entity taggers. The authors showed that inducing lists from extra unlabeled data improved generalization performance of the taggers. When lists were taken only from training data, there was a strong tendency to overfit.<br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Riloff and Jones, NCAI 1999]] and [[RelatedPaper::Etzioni, AIJ 2005]] use pattern induction with noun phrases, which are more language dependent than this method.<br />
<br />
[[RelatedPaper::Agichtein and Gravano, ICDL 2000]] induce patterns but apply this to tasks of relation extraction.<br />
<br />
[[RelatedPaper::Wang and Cohen, ICDM 2007]] introduce a method for set-expansion which is also language independent, relying on lists in the pages it is extracting from.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Talukdar_et_al_CoNLL_2006&diff=3068Talukdar et al CoNLL 20062010-12-01T12:29:00Z<p>PastStudents: </p>
<hr />
<div>== Citation == <br />
<br />
Talukdar, T., Brants, T., Liberman, M. and Pereira, F. "A Context Pattern Induction Method for Named Entity Extraction." Computational Natural Language Learning (CoNLL-X), 2006.<br />
<br />
== Online Version ==<br />
<br />
[http://acl.ldc.upenn.edu/W/W06/W06-29.pdf#page=157]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] extends previous methods for [[UsesMethod::pattern induction]] and uses the patterns to find new instances of interest, which then assist in [[AddressesProblem::named entity recognition]]. This is a form of [[UsesMethod::semi-supervised learning]], using unlabeled data to derive new features. The method is language independent, focusing on word and transition frequencies rather than chunking or parsing information. <br />
<br />
The method starts with seed instances, using them to find contexts frequently associated with the seeds. Rather than use the contexts directly, it then finds trigger words in the contexts that are rare in the corpus yet frequently found in the contexts by using IDF. These dominating words are used to define patterns later. Simply using IDF without accounting for the frequency of the word in --relevant-- contexts would lead to lower precision.<br />
<br />
The dominating words denote the start of phrases surrounding the entity of interest. These phrases are used to induce finite state automata in an effort to generalize from the phrases. The FSMs are pruned to remove transitions which have few paths using them (as opposed to which have a low weight locally on the transition).<br />
<br />
The resulting patterns from the FSMs are used to find new instances of entities to populate lists. During this process, the patterns are further filtered to encourage higher precision at the cost of recall. High quality entities from high quality patterns are added to the seed lists and the procedure then starts over.<br />
<br />
The induced lists were used as features to improve the performance of [[UsesMethod::CRF]] based entity taggers. The authors showed that inducing lists from extra unlabeled data improved generalization performance of the taggers. When lists were taken only from training data, there was a strong tendency to overfit.<br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Riloff and Jones, NCAI 1999]] and [[RelatedPaper::Etzioni, AIJ 2005]] use pattern induction with noun phrases, which are more language dependent than this method.<br />
<br />
[[RelatedPaper::Agichtein and Gravano, ICDL 2000]] induce patterns but apply this to tasks of relation extraction.<br />
<br />
[[RelatedPaper::Wang and Cohen, ICDM 2007]] introduce a method for set-expansion which is also language independent, relying on lists in the pages it is extracting from.<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.<br />
<br />
Similar to CRFs, a [[UsesMethod::semi-CRF]] applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.<br />
<br />
The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Skounakis, IJCAI 2003]] applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.<br />
<br />
[[RelatedPaper::Okanoharu, ACL 2006]] improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.<br />
<br />
[[RelatedPaper::Andrew, ENMLP 2006]] combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Satpal_and_Sarawagi_PKDD_2007&diff=3067Satpal and Sarawagi PKDD 20072010-12-01T10:50:32Z<p>PastStudents: </p>
<hr />
<div>== Citation == <br />
<br />
Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007).<br />
<br />
== Online Version ==<br />
<br />
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.8784&rep=rep1&type=pdf]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] introduces a method for [[AddressesProblem::transfer learning]] that encourages a model trained in one domain to use features that it shares in common with a target domain. This proves useful in instances where there is abundant training data in a domain (such as news wire articles) but little in a domain of interest (such as blogs).<br />
<br />
The key challenge in transfer learning is how to reconcile the differences between two distributions. This method addresses this by searching for the subset of features that are present in both domains where the expected values for the features are closest. For instance, if we were trying to recognize names of people, we might take capital letters to be a useful, maybe even required, feature, but that feature may not be reliable in informal blogs, and should be ignored.<br />
<br />
Selecting the best feature subset to use is accomplished not by exploring the power set of all feature combinations but by converting the problem into a soft selection problem, where we strongly down-weight features which diverge greatly between the two domains. The formulation is quite similar to regularizing a standard [[UsesMethod::CRF]] with a Gaussian prior whose variance for each feature changes depending on how different that feature is between domains.<br />
<br />
The algorithm can be trained using standard optimization approaches, but the objective function must be treated specially since it is non-convex and its gradient requires quadratic time to evaluate. The authors work around this by using a nested iterative approach, holding some expectations constant at each inner iteration. The method is tested on several combinations of training and target domains and is shown to bring improvements over unadapted models.<br />
<br />
== Related Papers ==<br />
<br />
The authors compare their method to [[RelatedPaper::Blitzer, EMNLP 2006]] (structural correspondence learning) and find that their method performs better on the majority of the tasks. The methods are orthogonal though and can be combined to yield even stronger performance.<br />
<br />
The method bears resemblance to generalized expectations [[RelatedPaper::Mann, ACL 2008]], which also seeks to use unlabeled data (but from the same domain) and constrains expectations. These constraints though are sourced from experts.<br />
<br />
[[RelatedPaper::Do and Ng, ANIPS 2006]] present a transfer learning approach which utilizes soft-max regression to train a meta-learner effective across domains.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Satpal_and_Sarawagi_PKDD_2007&diff=3066Satpal and Sarawagi PKDD 20072010-12-01T10:49:30Z<p>PastStudents: </p>
<hr />
<div>== Citation == <br />
<br />
Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007).<br />
<br />
== Online Version ==<br />
<br />
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.8784&rep=rep1&type=pdf]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] introduces a method for transfer learning that encourages a model trained in one domain to use features that it shares in common with a target domain. This proves useful in instances where there is abundant training data in a domain (such as news wire articles) but little in a domain of interest (such as blogs).<br />
<br />
The key challenge in transfer learning is how to reconcile the differences between two distributions. This method addresses this by searching for the subset of features that are present in both domains where the expected values for the features are closest. For instance, if we were trying to recognize names of people, we might take capital letters to be a useful, maybe even required, feature, but that feature may not be reliable in informal blogs, and should be ignored.<br />
<br />
Selecting the best feature subset to use is accomplished not by exploring the power set of all feature combinations but by converting the problem into a soft selection problem, where we strongly down-weight features which diverge greatly between the two domains. The formulation is quite similar to regularizing a standard [[UsesMethod::CRF]] with a Gaussian prior whose variance for each feature changes depending on how different that feature is between domains.<br />
<br />
The algorithm can be trained using standard optimization approaches, but the objective function must be treated specially since it is non-convex and its gradient requires quadratic time to evaluate. The authors work around this by using a nested iterative approach, holding some expectations constant at each inner iteration. The method is tested on several combinations of training and target domains and is shown to bring improvements over unadapted models.<br />
<br />
== Related Papers ==<br />
<br />
The authors compare their method to [[RelatedPaper::Blitzer, EMNLP 2006]] (structural correspondence learning) and find that their method performs better on the majority of the tasks. The methods are orthogonal though and can be combined to yield even stronger performance.<br />
<br />
The method bears resemblance to generalized expectations [[RelatedPaper::Mann, ACL 2008]], which also seeks to use unlabeled data (but from the same domain) and constrains expectations. These constraints though are sourced from experts.<br />
<br />
[[RelatedPaper::Do and Ng, ANIPS 2006]] present a transfer learning approach which utilizes soft-max regression to train a meta-learner effective across domains.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Smith_and_Osborne_CoNLL_2006&diff=3065Smith and Osborne CoNLL 20062010-12-01T10:01:35Z<p>PastStudents: Replaced content with '== Citation ==
Smith, A. and Osborne, M. "Using Gazetteers in Discriminative Information Extraction." Computational Natural Language Learning (CoNLL-X), 2006.
== Online Ve…'</p>
<hr />
<div>== Citation == <br />
<br />
Smith, A. and Osborne, M. "Using Gazetteers in Discriminative Information Extraction." Computational Natural Language Learning (CoNLL-X), 2006.<br />
<br />
== Online Version ==<br />
<br />
[http://acl.ldc.upenn.edu/W/W06/W06-29.pdf#page=149]<br />
<br />
<br />
== Summary ==</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Riloff_and_Jones_1999&diff=3064Riloff and Jones 19992010-12-01T10:01:13Z<p>PastStudents: Created page with '== Citation == Riloff, E. and Jones., R. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. Proceedings of the Sixteenth National Conference on Arti…'</p>
<hr />
<div>== Citation == <br />
<br />
Riloff, E. and Jones., R. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping. Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI-99). 1999. <br />
<br />
== Online Version ==<br />
<br />
[http://reference.kfupm.edu.sa/content/l/e/learning_dictionaries_for_information_ex_2607.pdf]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.<br />
<br />
Similar to CRFs, a [[UsesMethod::semi-CRF]] applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.<br />
<br />
The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Skounakis, IJCAI 2003]] applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.<br />
<br />
[[RelatedPaper::Okanoharu, ACL 2006]] improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.<br />
<br />
[[RelatedPaper::Andrew, ENMLP 2006]] combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=User:Jdang&diff=3063User:Jdang2010-12-01T10:00:54Z<p>PastStudents: </p>
<hr />
<div>== ''' James Dang ''' ==<br />
<br />
<br />
<br />
[[File:james_white_stupa.jpg]]<br />
<br />
== Background ==<br />
<br />
I'm a second year masters student in the Computational Biology program at CMU, hoping to learn more and more ways of acquiring all the world's knowledge and using it to automatically cure cancer! Sort of.<br />
<br />
My research interests include epigenetics, regulatory networks, protein interactions, and building databases of these through text mining.<br />
<br />
== Project ==<br />
<br />
[[Clinical IE Project F10]]<br />
<br />
[http://dangjc.webs.com/JamesDang_1108_pres.pdf mid project powerpoint]<br />
<br />
== Paper summaries ==<br />
<br />
September<br />
<br />
* [[Jiao et al COLING 2006]]<br />
* [[Ratnaparkhi EMNLP 1996]]<br />
* [[Morante and Daelemans CoNLL 2009]]<br />
<br />
October<br />
<br />
* [[Sarawagi and Cohen NIPS 2004]]<br />
* [[Mann and McCallum, ICML 2007]]<br />
* [[Kuksa and Qi, SIAM 2010]]<br />
<br />
November<br />
<br />
* [[Satpal and Sarawagi PKDD 2007]]<br />
* [[Talukdar et al CoNLL 2006]]<br />
* [[Riloff and Jones 1999]]</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Talukdar_et_al_CoNLL_2006&diff=3062Talukdar et al CoNLL 20062010-12-01T09:21:41Z<p>PastStudents: Created page with '== Citation == Talukdar, T., Brants, T., Liberman, M. and Pereira, F. "A Context Pattern Induction Method for Named Entity Extraction." Computational Natural Language Learning …'</p>
<hr />
<div>== Citation == <br />
<br />
Talukdar, T., Brants, T., Liberman, M. and Pereira, F. "A Context Pattern Induction Method for Named Entity Extraction." Computational Natural Language Learning (CoNLL-X), 2006.<br />
<br />
== Online Version ==<br />
<br />
[http://acl.ldc.upenn.edu/W/W06/W06-29.pdf#page=157]<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.<br />
<br />
Similar to CRFs, a [[UsesMethod::semi-CRF]] applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.<br />
<br />
The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Skounakis, IJCAI 2003]] applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.<br />
<br />
[[RelatedPaper::Okanoharu, ACL 2006]] improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.<br />
<br />
[[RelatedPaper::Andrew, ENMLP 2006]] combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Smith_and_Osborne_CoNLL_2006&diff=3061Smith and Osborne CoNLL 20062010-12-01T09:21:37Z<p>PastStudents: Created page with '== Citation == Smith, A. and Osborne, M. "Using Gazetteers in Discriminative Information Extraction." Computational Natural Language Learning (CoNLL-X), 2006. == Online Versio…'</p>
<hr />
<div>== Citation == <br />
<br />
Smith, A. and Osborne, M. "Using Gazetteers in Discriminative Information Extraction." Computational Natural Language Learning (CoNLL-X), 2006.<br />
<br />
== Online Version ==<br />
<br />
[http://acl.ldc.upenn.edu/W/W06/W06-29.pdf#page=149]<br />
<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.<br />
<br />
Similar to CRFs, a [[UsesMethod::semi-CRF]] applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.<br />
<br />
The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Skounakis, IJCAI 2003]] applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.<br />
<br />
[[RelatedPaper::Okanoharu, ACL 2006]] improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.<br />
<br />
[[RelatedPaper::Andrew, ENMLP 2006]] combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Satpal_and_Sarawagi_PKDD_2007&diff=3060Satpal and Sarawagi PKDD 20072010-12-01T09:18:09Z<p>PastStudents: Created page with '== Citation == Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007). == Online Version == […'</p>
<hr />
<div>== Citation == <br />
<br />
Satpal, S. and Sarawagi, S. Domain adaptation of conditional probability models via feature subsetting. Proceedings of PKDD’07 (2007).<br />
<br />
== Online Version ==<br />
<br />
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.99.8784&rep=rep1&type=pdf]<br />
<br />
<br />
== Summary == <br />
<br />
This [[Category::paper]] extends standard CRFs to allow each state to correspond to more than one word or token, similar to the way semi-HMMs extend HMMs. This allows for a richer feature set to be modeled, as features can now correspond to multiple words rather than just one word. These features are quite beneficial in a range of applications where the entities tend to be longer than just one word, including NP-chunking and NER.<br />
<br />
Similar to CRFs, a [[UsesMethod::semi-CRF]] applies one exponential model over the whole sequence. However, instead of modeling a sequence of words, we model a sequence of segments, which each are multiple words belonging to the same state. This expands the space to be explored, so that when performing inference, the Viterbi-like recursion algorithm must also maximize over the segment boundaries. The consequence of this is relatively minor, with inference still taking polynomial time. This cost is less than higher order CRFs, which consider all combinations of the L previous states, whereas semi-CRFs only consider where the L previous states are the same. Training the model is not much harder either. The likelihood is still convex and a recursion step will yield the normalizer.<br />
<br />
The method was then tested on various datasets for NER tasks and compared to standard CRFs. The key ingredient was the choice of richer features in the semi-CRF models. These segment-level features included the number of capital letters in a segment, the segment lengths, and dictionaries that allowed for non-exact matchings. Segment lengths, particularly, can be modeled as any distribution (such as Guassian or exponential) depending upon how this feature is defined, which is a commonly touted benefit of semi-HMMs over regular HMMs. The results indicate that the semi-CRFs outperformed the regular CRFs in almost all cases, sometimes by quite large margins. <br />
<br />
== Related Papers ==<br />
<br />
[[RelatedPaper::Skounakis, IJCAI 2003]] applies hierarchical HMMs to IE, which model segments like semi-CRFs, but where the segments are themselves Markov processes.<br />
<br />
[[RelatedPaper::Okanoharu, ACL 2006]] improve the speed of semi-CRFs when the entities are very long by using a filtering process and a feature forest model.<br />
<br />
[[RelatedPaper::Andrew, ENMLP 2006]] combine semi-CRFs with traditional CRFs in order to use segment and word level features. Some word level features are not well represented in the semi-CRF model. He demonstrates improved performance on the task of Chinese word segmentation.</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=User:Jdang&diff=3059User:Jdang2010-12-01T09:09:34Z<p>PastStudents: </p>
<hr />
<div>== ''' James Dang ''' ==<br />
<br />
<br />
<br />
[[File:james_white_stupa.jpg]]<br />
<br />
== Background ==<br />
<br />
I'm a second year masters student in the Computational Biology program at CMU, hoping to learn more and more ways of acquiring all the world's knowledge and using it to automatically cure cancer! Sort of.<br />
<br />
My research interests include epigenetics, regulatory networks, protein interactions, and building databases of these through text mining.<br />
<br />
== Project ==<br />
<br />
[[Clinical IE Project F10]]<br />
<br />
[http://dangjc.webs.com/JamesDang_1108_pres.pdf mid project powerpoint]<br />
<br />
== Paper summaries ==<br />
<br />
September<br />
<br />
* [[Jiao et al COLING 2006]]<br />
* [[Ratnaparkhi EMNLP 1996]]<br />
* [[Morante and Daelemans CoNLL 2009]]<br />
<br />
October<br />
<br />
* [[Sarawagi and Cohen NIPS 2004]]<br />
* [[Mann and McCallum, ICML 2007]]<br />
* [[Kuksa and Qi, SIAM 2010]]<br />
<br />
November<br />
<br />
* [[Satpal and Sarawagi PKDD 2007]]<br />
* [[Talukdar et al CoNLL 2006]]<br />
* [[Smith and Osborne CoNLL 2006]]</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=Huang_et_al,_ACL_2009:_Profile_Based_Cross-Document_Coreference_Using_Kernelized_Fuzzy_Relational_Clustering&diff=3058Huang et al, ACL 2009: Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering2010-12-01T05:45:38Z<p>PastStudents: </p>
<hr />
<div>== Citation ==<br />
<br />
Jian Huang, Sarah M. Taylor, Jonathan L. Smith, Konstantinos A. Fotiadis and C. Lee Giles. 2009. Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 414–422.<br />
<br />
== Online version ==<br />
<br />
An online version of this paper is available [http://acl.eldoc.ub.rug.nl/mirror/P/P09/P09-1047.pdf].<br />
<br />
== Summary ==<br />
<br />
This [[Category::paper]] solves the problem of [[AddressesProblem::Cross Document Coreference (CDC)]] by using Information Extraction tools to make profiles of entities, measuring the distance between profiles by a learned distance function, and finally clustering them using kernelized fuzzy relational clustering. <br />
<br />
== Constructing entity profiles using IE and WDC ==<br />
<br />
An information extraction tool first extracts Named Entities and their relationships. For the NEs of interest, a [[AddressesProblem::Within Document Coreference (WDC)]] module then links the entities deemed as referring to the same underlying identity into a WDC chain. They use the information extraction tool AeroText for this purpose. AeroText extracts two types of information for an entity: the attribute information about the person named entity includes first/middle/last names, gender, mention, etc, and also, relationship information between named entities, such as Family, List, Employment, Ownership, Citizen-Resident-Religion-Ethnicity and so on, as specified in the ACE evaluation. AeroText resolves the references of entities within a document and produces the entity profiles used as input to their CDC system. Each entity is represented as a profile which contains the NE, its attributes and associated relationships.<br />
<br />
== Kernelized Fuzzy Relational Clustering ==<br />
<br />
For clustering the entities, they use the Kernelized Fuzzy Relational Clustering algorithm (KARC). This algorithm is based on the Any Relation Clustering Algorithm (ARCA), which represents relational data as object data using their mutual relation strength and uses Fuzzy C-Means for clustering. Each chained entity is represented as a vector of its relation strengths with all the entities. Fuzzy clusters can then be obtained by grouping closely related patterns using object clustering algorithm.<br />
<br />
The kernelized fuzzy clustering algorithm KARC works as follows. The chained entities E are first objectified into a relation strength matrix R using Specialist Exponentiated Gradient (SEG). A Gram matrix K is then computed based on the relation strength vectors using the kernel function. For a given number of clusters C, the initialization step is done by randomly picking C<br />
patterns as cluster centers. The kernel distance matrix D is initialized and subsequently KARC alternately updates the membership matrix U and the kernel distance matrix D until convergence or running more than a certain number of iterations. Finally, the soft partition is generated based on the membership matrix U, which is the desired cross document coreference result.<br />
<br />
The number of true underlying identities may vary depending on the entities’ level of ambiguity (e.g. name frequency). To select the optimal number of clusters, the authors adopt the Xie-Beni Index (XBI) (Xie and Beni, 1991) as in ARCA, which is one of the most popular cluster validities for fuzzy clustering algorithms.<br />
<br />
== Learning Distance Functions from a suite of similarity measures ==<br />
<br />
A suite of similarity functions is designed to determine if the attributes relationships in a pair of entity profiles match or not: SoftTFIDF, JC Semantic Similarity, Rule-based metrics, etc. The authors treat each similarity function as a specialist that specializes in computing the similarity of a particular type of relationship. They utilize a specialist ensemble learning framework (SEG) to combine these component similarities into the relation strength for clustering. Here, a specialist is awakened for prediction only when the same type of relationships are present in both chained entities. A specialist can choose not to make a prediction if it is not confident enough for an instance. Also, specialists have different weights (in addition to their prediction) on the final relation strength.<br />
<br />
== Experiments and Evaluation ==<br />
<br />
They use the ACL SemEval-2007 web person search task ([[UsesDataset::WePS]]). The authors use the standard purity and inverse purity clustering metrics as in the WePS evaluation. The test collection consists of three sets of 10 different names, sampled from ambiguous names from English Wikipedia (famous people), participants of the ACL 2006 conference (computer scientists) and common names from the US Census data, respectively. For each name, the top 100 documents retrieved from the Yahoo! Search API are used. <br />
<br />
The authors report macro-averaged purity of 0.657, inverse purity of 0.795 and an F score of 0.740. This compares better than the results of the first tier systems in the WePS 2007 official evaluation. <br />
<br />
==Conclusion==<br />
The authors present interesting learning (SEG) and clustering (KARC) methods to solve the problem of [[AddressesProblem::Cross Document Coreference (CDC)]].<br />
<br />
== Relevant Papers ==<br />
<br />
{{#ask: [[AddressesProblem::Cross Document Coreference (CDC)]]<br />
| ?UsesMethod<br />
| ?UsesDataset<br />
}}</div>PastStudentshttp://curtis.ml.cmu.edu/w/courses/index.php?title=User:Rnshah&diff=3057User:Rnshah2010-12-01T05:45:06Z<p>PastStudents: </p>
<hr />
<div>== Rushin Shah ==<br />
[[File:Rushin.jpg]]<br />
<br />
[http://www.cs.cmu.edu/~rnshah/ Home Page] [http://www.cs.cmu.edu/~rnshah/resume.pdf Resume]<br />
<br />
<br />
''' About me '''<br />
<br />
My name is Rushin Shah, and I'm a second year LTI Master's student. I want to get an in-depth understanding of the various challenges, ideas and techniques covered in the field of information extraction. I'm currently working with [http://www.cs.cmu.edu/~ref/ Dr. Robert Frederking] on multilingual named entity extraction and co-reference resolution. One particular problem that we're working on right now is cross-document co-reference resolution, and I hope to be able to apply the knowledge that I get from this course towards furthering our research.<br />
<br />
This is my [http://www.cs.cmu.edu/~rnshah/ homepage] and here's my [http://www.cs.cmu.edu/~rnshah/resume.pdf resume]. My areas of interest are machine learning, information extraction, natural language processing, social media and recommendation systems.<br />
<br />
Papers added to the wiki for September:<br />
<br />
[[Frietag 2000 Maximum Entropy Markov Models for Information Extraction and Segmentation]]<br />
<br />
[[Lafferty 2001 Conditional Random Fields]]<br />
<br />
[[Within Document Coreference (WDC)]]<br />
<br />
Pages added to the wiki for October:<br />
<br />
[[Cross Document Coreference (CDC)]]<br />
<br />
[[ACE 2005 Dataset]]<br />
<br />
[[Relation Extraction]]<br />
<br />
Pages added to the wiki for November: <br />
<br />
[[Ravichandran and Hovy, ACL 2002: Learning Surface Text Patterns for a Question Answering System]]<br />
<br />
[[Huang et al, ACL 2009: Profile Based Cross-Document Coreference Using Kernelized Fuzzy Relational Clustering]]<br />
<br />
[[Huang et al, Coling 2010: Enhancing Cross Document Coreference of Web Documents with Context Similarity and Very Large Scale Text Categorization]]</div>PastStudents