Difference between revisions of "Class Meeting for 10-707 10/20/2010"

From Cohen Courses
Jump to navigationJump to search
(Created page with 'This is one of the class meetings on the schedule for the course Information Extraction 10-707 in Fall 2010. === …')
 
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
This is one of the class meetings on the [[Syllabus for Information Extraction 10-707 in Fall 2010|schedule]] for the course [[Information Extraction 10-707 in Fall 2010]].
 
This is one of the class meetings on the [[Syllabus for Information Extraction 10-707 in Fall 2010|schedule]] for the course [[Information Extraction 10-707 in Fall 2010]].
  
=== IE and Reasoning 1 - WHIRL ===
+
=== Overview of Bootstrapping and KnowItAll ===
  
* [http://www.cs.cmu.edu/~wcohen/10-707/10-20-whirl.ppt Slides].
+
* [http://www.cs.cmu.edu/~wcohen/10-707/10-20-semisupervised.ppt Slides]
  
 
=== Required Readings ===
 
=== Required Readings ===
  
* [[required::cohen_2000_whirl_a_word_based_information_representation_language | {{MyCitejournal| date = 2000| doi = http://dx.doi.org/10.1016/S0004-3702(99)00102-2| first = William W| issn = 0004-3702| issue = 1-2| journal = Artif. Intell.| last = Cohen| pages = 163–196| title = WHIRL: a word-based information representation language| volume = 118}}]]
+
* [[etzioni_2004_methods_for_domain_independent_information_extraction_from_the_web_an_experimental_comparison | {{MyCiteconference| booktitle = Proceedings of the national conference on artificial intelligence| coauthors = M. Cafarella, D. Downey, A. M Popescu, T. Shaked, S. Soderland, D. S Weld, A. Yates| date = 2004| first = O.| last = Etzioni| pages = 391–398| title = Methods for domain-independent information extraction from the web: An experimental comparison}}]].  About KnowItAll.
* [[required::cohen_2000_hardening_soft_information_sources | {{MyCiteconference| booktitle = KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining| coauthors = Henry Kautz, David McAllester| date = 2000| doi = http://doi.acm.org/10.1145/347090.347141| first = William W| isbn = 1-58113-233-6| last = Cohen| location = New York, NY, USA| pages = 255–259| publisher = ACM| title = Hardening soft information sources}}]]
 
* [[required::cohen_2003_a_comparison_of_string_distance_metrics_for_name_matching_tasks | {{MyCiteconference| booktitle = Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-03)| coauthors = P. Ravikumar, S. E Fienberg| date = 2003| first = W. W| last = Cohen| title = A comparison of string distance metrics for name-matching tasks}}]]
 
  
This is a lot of material, I know, and it's awkward to critique the instructors work.  You don't need to read the journal paper in all its glorious detail - I'll cover it in class - but I do recommend looking it over first.  If you prefer you can just write down one or two questions about each paper.
+
=== Optional Readings ===
  
=== Optional Readings ===
+
* [[tomita_2006_expanding_the_recall_of_relation_extraction_by_bootstrapping | {{MyCiteconference| booktitle = Adaptive Text Extraction and Mining (ATEM 2006)| coauthors = S. S.O Etzioni| date = 2006| first = J.| last = Tomita| pages = 56| title = Expanding the recall of relation extraction by bootstrapping}}]]
 +
* [[pasca_2007_weakly_supervised_discovery_of_named_entities_using_web_search_queries | {{MyCiteconference| booktitle = CIKM '07| date = 2007| first = M.| last = Pasca| title = Weakly-supervised discovery of named entities using web search queries}}]]
 +
* [[schoenmackers_2008_scaling_textual_inference_to_the_web | {{MyCiteconference| booktitle = Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing| coauthors = O. Etzioni, D. S Weld, T. Center| date = 2008| first = S.| last = Schoenmackers| pages = 79–88| title = Scaling textual inference to the Web}}]]
 +
* [[hovy_2009_toward_completeness_in_concept_extraction_and_classification | {{MyCiteconference| booktitle = Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing| coauthors = Z. Kozareva, E. Riloff| date = 2009| first = E.| last = Hovy| title = Toward Completeness in Concept Extraction and Classification}}]]
 +
* [[mota_2009_updating_a_name_tagger_using_contemporary_unlabeled_data | {{MyCiteconference| booktitle = Proceedings of the ACL-IJCNLP 2009 Conference Short Papers| coauthors = R. Grishman| date = 2009| first = C.| last = Mota| pages = 353–356| title = Updating a Name Tagger Using Contemporary Unlabeled Data}}]]
 +
* [[pantel_2009_web_scale_distributional_similarity_and_entity_set_expansion | {{MyCitejournal| coauthors = E. Crestan, A. Borkovsky, A. M Popescu, V. Vyas| date = 2009| first = P.| journal = Proceedings of EMNLP-09, Singapore| last = Pantel| title = Web-scale distributional similarity and entity set expansion}}]]
 +
* [[tomanek_noyear_semi_supervised_active_learning_for_sequence_labeling | {{MyCiteconference| booktitle = Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP| coauthors = U. Hahn| first = K.| last = Tomanek| pages = 1039–1047| date = 2009 | title = Semi-Supervised Active Learning for Sequence Labeling}}]]
 +
* [[yan_noyear_unsupervised_relation_extraction_by_mining_wikipedia_texts_using_information_from_the_web | {{MyCiteconference| booktitle = Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP| date = 2009 | coauthors = N. Okazaki, Y. Matsuo, Z. Yang, M. Ishizuka| first = Y.| last = Yan| pages = 1021-1029| title = Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web}}]].  Combines dependency patterns in parse wikipedia text and bootstrapping-style surface patterns on Web text.
 +
* [[talukdar_noyear_a_context_pattern_induction_method_for_named_entity_extraction | {{MyCiteconference | booktitle = Tenth Conference on Computational Natural Language Learning| coauthors = T. Brants, M. L.F Pereira| first = P. P| last = Talukdar| title = A context pattern induction method for named entity extraction }}]].
 +
* Talukdar, P. P, and F. Pereira. 2010. Experiments in Graph-based Semi-Supervised Learning Methods for Class-Instance Acquisition. In 48th Annual Meeting of the Association for Computational Linguistics (ACL). Vol. 45.  Comparison of different graph-based semi-supervised learning methods for information extraction tasks.
 +
* Druck, G., and A. McCallum. 2010. High-Performance Semi-Supervised Learning using Discriminatively Constrained Generative Models. In ICML 2010.  Constrain a generative HMM training procedure to also satisfy the feature expectations associated with a CRF model.
 +
* Bollegala, D. T, Y. Matsuo, and M. Ishizuka. 2010. Relational duality: unsupervised extraction of semantic relations between entities on the web. In Proceedings of the 19th international conference on World wide web, 151–160.  A fast and effective method for simultaneously clustering entity-pairs into relations and entities into classes.
 +
* Yin, X., W. Tan, X. Li, and Y. C Tu. 2010. Automatic extraction of clickable structured web contents for name entity queries. In Proceedings of the 19th international conference on World wide web, 991–1000. Finds seeds to use for site-specific wrappers by analyzing query logs.
  
* [[artiles_2009_the_role_of_named_entities_in_web_people_search | {{MyCiteconference | booktitle = Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing| coauthors = S. Madrid, E. Amigó, J. Gonzalo| date = 2009| first = J.| last = Artiles| pages = 534-542| title = The role of named entities in Web People Search }}]]
+
== Student Presentation ==
* [[bhattacharya_2006_a_latent_dirichlet_model_for_unsupervised_entity_resolution | {{MyCiteconference | booktitle = SIAM International Conference on Data Mining| coauthors = L. Getoor| date = 2006| first = I.| last = Bhattacharya| pages = 47-58| title = A latent dirichlet model for unsupervised entity resolution }}]]
+
[http://malt.ml.cmu.edu/mw/index.php/User:Rnshah Rushin Shah]
* [[gravano_2003_text_joins_in_an_rdbms_for_web_data_integration | {{MyCiteconference | accessdate = 2009-08-03| booktitle = Proceedings of the 12th international conference on World Wide Web| coauthors = Panagiotis G. Ipeirotis, Nick Koudas, Divesh Srivastava| date = 2003| doi = 10.1145/775152.775166| first = Luis| isbn = 1-58113-680-3| last = Gravano| location = Budapest, Hungary| pages = 90-101| publisher = ACM| title = Text joins in an RDBMS for web data integration| url = http://portal.acm.org/citation.cfm?id=775166 }}]]
 
* [[li_2004_robust_reading_identification_and_tracing_of_ambiguous_names | {{MyCiteconference | booktitle = Proc. of NAACL| coauthors = P. Morie, D. Roth| date = 2004| first = X.| last = Li| pages = 17-24| title = Robust reading: Identification and tracing of ambiguous names }}]]
 
* [[moreau_2008_robust_similarity_measures_for_named_entities_matching | {{MyCiteconference | booktitle = Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)| coauthors = F. Yvon, O. Cappe| date = 2008| first = E.| last = Moreau| title = Robust Similarity Measures for Named Entities Matching }}]]
 

Latest revision as of 11:14, 20 October 2010

This is one of the class meetings on the schedule for the course Information Extraction 10-707 in Fall 2010.

Overview of Bootstrapping and KnowItAll

Required Readings

Optional Readings

Student Presentation

Rushin Shah