Revision as of 15:32, 7 September 2011

Identifying Abbreviations in Biomedical Text

Idea

Abbreviations, synonyms and acronyms are heavily used in biomedical literature, for describing names of genes, diseases, biological processes and more. Recognizing short or alternative name forms and mapping them to the full form is important to the full understanding of the scientific text. In the context of information extraction tasks, recognizing abbreviated forms can lead to a great increase in recall. This task is especially challenging since abbreviations are often reused, for example, names of genes and systems are shared across species, and since researchers often do not adhere to standard naming conventions. In this project we wish to provide a model for linking an abbreviated or short form biomedical terms to full terms as well as recognize abbreviations that may relate to more than a single entity.

Team

Dana Movshovitz-Attias

Dataset

MEDSTRACT is a collection of automatically extracted acronym pairs from MEDLINE databases. The data includes:

Gold Standard Data: Sentences including abbreviations.
Gold Standard Results: Pairs of abbreviation and full form name, that appear in the data.

Related Work

A simple algorithm for identifying abbreviation definitions in biomedical text by A. S. Schwartz and M. A. Hearst
An Automatic Identification and Resolution System for Protein-Related Abbreviations in Scientific Papers by Paolo Atzeni, Fabio Polticelli and Daniele Toti
Mapping Abbreviations to Full Forms in Biomedical Articles by Hong Yu, George Hripcsak and Carol Friedman

@@ Line 12: / Line 12: @@
 == Dataset ==
-[http://medstract.com/ MEDSTRACT] is a tool for automated extraction of acronym pairs from MEDLINE databases.
+[http://medstract.com/ MEDSTRACT] is a collection of automatically extracted acronym pairs from MEDLINE databases.
-If includes:
+The data includes:
 :* [http://medstract.com/index.php?f=gold-standard Gold Standard Data]: Sentences including abbreviations.
 :* [http://medstract.com/index.php?f=gold-result Gold Standard Results]: Pairs of abbreviation and full form name, that appear in the data.

Difference between revisions of "Cohen Courses:Dmovshov abbreviations"

Revision as of 15:32, 7 September 2011

Contents

Identifying Abbreviations in Biomedical Text

Idea

Team

Dataset

Related Work

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools