http://curtis.ml.cmu.edu/w/courses/index.php?title=Rao,_D.,_D._Yarowsky,_A._Shreevats,_and_M._Gupta._2010._Classifying_latent_user_attributes_in_twitter._In_Proceedings_of_the_2nd_international_workshop_on_Search_and_mining_user-generated_contents,_37%E2%80%9344.&feed=atom&action=history
Rao, D., D. Yarowsky, A. Shreevats, and M. Gupta. 2010. Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, 37–44. - Revision history
2024-03-29T13:57:22Z
Revision history for this page on the wiki
MediaWiki 1.33.1
http://curtis.ml.cmu.edu/w/courses/index.php?title=Rao,_D.,_D._Yarowsky,_A._Shreevats,_and_M._Gupta._2010._Classifying_latent_user_attributes_in_twitter._In_Proceedings_of_the_2nd_international_workshop_on_Search_and_mining_user-generated_contents,_37%E2%80%9344.&diff=5230&oldid=prev
Rnshah at 03:01, 1 April 2011
2011-04-01T03:01:55Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 03:01, 1 April 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8" >Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Dataset ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Dataset ==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The authors build distinct datasets for each attribute mentioned above in a semi-supervised manner. For gender, they get their seed set for a crawl from initial sources including sororities, fraternities, and male and female hygiene products. This produces around 500 users in each class. For age, they manually classify seed users as being below or above 30. For regional origins, they selected seed users from cities with low cross-migration and performed manual annotation. Finally, for political orientation,they looked at twitter lists for the National Rife Association (NRA),keyword searches like "support Palin" or "proud democrat" and hashtags related to</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The authors build distinct datasets for each attribute mentioned above in a semi-supervised manner. For gender, they get their seed set for a crawl from initial sources including sororities, fraternities, and male and female hygiene products. This produces around 500 users in each class. For age, they manually classify seed users as being below or above 30. For regional origins, they selected seed users from cities with low cross-migration and performed manual annotation <ins class="diffchange diffchange-inline">of users as being from either North or South India</ins>. Finally, for political orientation,they looked at twitter lists for the National Rife Association (NRA),keyword searches like "support Palin" or "proud democrat" and hashtags related to current news events <ins class="diffchange diffchange-inline">and classified users as being either Republicans or Democrats</ins>.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>current news events. </div></td><td colspan="2"> </td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Network Structure and Communication Behavior ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Network Structure and Communication Behavior ==</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l15" >Line 15:</td>
<td colspan="2" class="diff-lineno">Line 14:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>For each of the classes in each of the attributes, the authors studied if there was any difference</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>For each of the classes in each of the attributes, the authors studied if there was any difference</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>in the distribution of: </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>in the distribution of: </div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>1. The follower-following ratio: The ratio of number of</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>1. The follower-following ratio: The ratio of number of</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>followers of an user to the number of users he/she is</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>followers of an user to the number of users he/she is</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>following.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>following.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>2. The follower frequency: The number of followers</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>2. The follower frequency: The number of followers</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>3. The following frequency: The number of followees</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>3. The following frequency: The number of followees</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>They found that these features did not correlate with the attributes under consideration. They similarly considered distributions of:</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>They found that these features did not correlate with the attributes under consideration. They similarly considered distributions of:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>1. Response frequency: percentage of tweets from the user</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>1. Response frequency: percentage of tweets from the user</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>that are replies</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>that are replies</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>2. Retweet frequency: percentage of tweets that are retweets.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>2. Retweet frequency: percentage of tweets that are retweets.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>3. Tweet frequency: percentage of tweets that are from</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>3. Tweet frequency: percentage of tweets that are from</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>the user, uninitiated.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>the user, uninitiated.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Again,they observed no exploitable difference in the communication behavior on these dimensions by male and female users, or for classifying age, regional origin, and political orientation. This is in contrast to the results of similar experiments for speech and conversation related data.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Again,they observed no exploitable difference in the communication behavior on these dimensions by male and female users, or for classifying age, regional origin, and political orientation. This is in contrast to the results of similar experiments for speech and conversation related data.</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l32" >Line 32:</td>
<td colspan="2" class="diff-lineno">Line 31:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The authors used [[UsesMethod::Support Vector Machines]] to train binary classifiers for each attribute. They used three sets of features, all derived from the tweet messages alone, and trained the following three SVM classifiers:</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The authors used [[UsesMethod::Support Vector Machines]] to train binary classifiers for each attribute. They used three sets of features, all derived from the tweet messages alone, and trained the following three SVM classifiers:</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>1) Sociolinguistic features based model</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>1) Sociolinguistic features based model</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The authors rely on prior research in sociolinguistics that shows differences in lexical choice and other linguistic features in discourse conditioned on age, gender, and social class. For example, in speech it is well known that certain utterances like "umm", "uh-huh", and back channel responses like laughter and lip smacking are more prevalent among female speakers than their male counterparts. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>The authors rely on prior research in sociolinguistics that shows differences in lexical choice and other linguistic features in discourse conditioned on age, gender, and social class. For example, in speech it is well known that certain utterances like "umm", "uh-huh", and back channel responses like laughter and lip smacking are more prevalent among female speakers than their male counterparts. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>2) Ngram-features</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>2) Ngram-features <ins class="diffchange diffchange-inline">based model</ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>These models have been used earlier for identification of gender from telephone conversations. The authors utilized this as one class of approaches for Twitter classification by deriving the unigrams and bigrams of the tweet text. The text was segmented and normalized to preserve emoticons and other punctuation sequences as first-class lexical items. The authors find that emoticons serve as important features, and that TFIDF did worse for this task on a development set than TF alone. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>These models have been used earlier for identification of gender from telephone conversations. The authors utilized this as one class of approaches for Twitter classification by deriving the unigrams and bigrams of the tweet text. The text was segmented and normalized to preserve emoticons and other punctuation sequences as first-class lexical items. The authors find that emoticons serve as important features, and that TFIDF did worse for this task on a development set than TF alone. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>3) A stacked model that combines SVMs based on the above two feature sets</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">* </ins>3) A stacked model that combines SVMs based on the above two feature sets</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Finally,the authors employed a stacked SVM model to do simple classifier stacking. Its features are the predictions from the Ngram-feature and SocioLinguistic models along with their prediction weights</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Finally,the authors employed a stacked SVM model to do simple classifier stacking. Its features are the predictions from the Ngram-feature and SocioLinguistic models along with their prediction weights</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">== Results ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">=== Gender ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The authors achieved an accuracy of 72.33% for gender using their stacked model, significantly beating the 50% baseline of random chance. They found that their sociolinguistic model did better than their n-gram model for this task. They found that sociolinguistic features such as emoticons, ellipses, alphabetic character repetitions inside words and exlcamantion marks were very helpful in discriminating females from males.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">=== Age ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The authors achieved an accuracy of 74.11% in classifying users as above or below 30 with the stacked model, and for this task found that their n-gram based model did significantly better than their socio-linguistic feature based model. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">=== Regional Origin ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The authors achieved an accuracy of 77% for this task using their sociolinguistic model, and it outperformed their stacked as well as n-gram based models. They found that for some unknown reason, north Indian users tended to use emoticons, repeat characters, show excitement (repeated exclamation, puzzled punctuation) more than south Indian users.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">=== Political Orientation ===</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">The authors achieved an accuracy of 82% for this task and in this case, their n-gram based model beat their stacked and sociolinguistic models. They found that certain words such as handgun or vegetarianism were almost always indicative of Republican or Democrat respectively.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">== Conclusion ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">This paper presents a novel set of features and approaches for automatically classifying latent user attributes including gender, age, regional origin, and political orientation solely from the user language of informal communication such as Twitter.</ins></div></td></tr>
</table>
Rnshah
http://curtis.ml.cmu.edu/w/courses/index.php?title=Rao,_D.,_D._Yarowsky,_A._Shreevats,_and_M._Gupta._2010._Classifying_latent_user_attributes_in_twitter._In_Proceedings_of_the_2nd_international_workshop_on_Search_and_mining_user-generated_contents,_37%E2%80%9344.&diff=5228&oldid=prev
Rnshah at 02:50, 1 April 2011
2011-04-01T02:50:25Z
<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 02:50, 1 April 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l3" >Line 3:</td>
<td colspan="2" class="diff-lineno">Line 3:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Summary ==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Summary ==</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The authors investigate the use of rich feature sets and stacked [[UsesMethod:: Support Vector <del class="diffchange diffchange-inline">Machine </del>| SVM]] based classifiers to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The authors investigate the use of rich feature sets and stacked [[UsesMethod:: Support Vector <ins class="diffchange diffchange-inline">Machines </ins>| SVM]] based classifiers to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">They also include an analysis of features and approaches that are effective and not effective in classifying such user attributes in Twitter-style data as opposed to spoken genres previously studied in the user-property classiffication literature.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">== Dataset ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">The authors build distinct datasets for each attribute mentioned above in a semi-supervised manner. For gender, they get their seed set for a crawl from initial sources including sororities, fraternities, and male and female hygiene products. This produces around 500 users in each class. For age, they manually classify seed users as being below or above 30. For regional origins, they selected seed users from cities with low cross-migration and performed manual annotation. Finally, for political orientation,they looked at twitter lists for the National Rife Association (NRA),keyword searches like "support Palin" or "proud democrat" and hashtags related to</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">current news events. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">== Network Structure and Communication Behavior ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">For each of the classes in each of the attributes, the authors studied if there was any difference</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">in the distribution of: </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">1. The follower-following ratio: The ratio of number of</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">followers of an user to the number of users he/she is</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">following.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">2. The follower frequency: The number of followers</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">3. The following frequency: The number of followees</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">They found that these features did not correlate with the attributes under consideration. They similarly considered distributions of:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">1. Response frequency: percentage of tweets from the user</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">that are replies</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">2. Retweet frequency: percentage of tweets that are retweets.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">3. Tweet frequency: percentage of tweets that are from</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">the user, uninitiated.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Again,they observed no exploitable difference in the communication behavior on these dimensions by male and female users, or for classifying age, regional origin, and political orientation. This is in contrast to the results of similar experiments for speech and conversation related data.</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">== Classification Models ==</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">The authors used [[UsesMethod::Support Vector Machines]] to train binary classifiers for each attribute. They used three sets of features, all derived from the tweet messages alone, and trained the following three SVM classifiers:</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">1) Sociolinguistic features based model</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">The authors rely on prior research in sociolinguistics that shows differences in lexical choice and other linguistic features in discourse conditioned on age, gender, and social class. For example, in speech it is well known that certain utterances like "umm", "uh-huh", and back channel responses like laughter and lip smacking are more prevalent among female speakers than their male counterparts. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">2) Ngram-features</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">These models have been used earlier for identification of gender from telephone conversations. The authors utilized this as one class of approaches for Twitter classification by deriving the unigrams and bigrams of the tweet text. The text was segmented and normalized to preserve emoticons and other punctuation sequences as first-class lexical items. The authors find that emoticons serve as important features, and that TFIDF did worse for this task on a development set than TF alone. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">3) A stacked model that combines SVMs based on the above two feature sets</ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">Finally,the authors employed a stacked SVM model to do simple classifier stacking. Its features are the predictions from the Ngram-feature and SocioLinguistic models along with their prediction weights</ins></div></td></tr>
</table>
Rnshah
http://curtis.ml.cmu.edu/w/courses/index.php?title=Rao,_D.,_D._Yarowsky,_A._Shreevats,_and_M._Gupta._2010._Classifying_latent_user_attributes_in_twitter._In_Proceedings_of_the_2nd_international_workshop_on_Search_and_mining_user-generated_contents,_37%E2%80%9344.&diff=5206&oldid=prev
Rnshah: Created page with '== Online Version == An online version of this Category::paper is available here: [http://www.cs.jhu.edu/~delip/smuc.pdf] == Summary == The authors investigate the use of ri…'
2011-04-01T02:26:21Z
<p>Created page with '== Online Version == An online version of this [[Category::paper]] is available here: [http://www.cs.jhu.edu/~delip/smuc.pdf] == Summary == The authors investigate the use of ri…'</p>
<p><b>New page</b></p><div>== Online Version ==<br />
An online version of this [[Category::paper]] is available here: [http://www.cs.jhu.edu/~delip/smuc.pdf]<br />
<br />
== Summary ==<br />
The authors investigate the use of rich feature sets and stacked [[UsesMethod:: Support Vector Machine | SVM]] based classifiers to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language.</div>
Rnshah