Reynar et al, A maximum entropy approach to identifying sentence boundaries. 1997

From Cohen Courses
Revision as of 19:42, 27 September 2011 by Tkumar (talk | contribs) (→‎Summary)
Jump to navigationJump to search

Citation

Jeffrey C. Reynar and Adwait Ratnaparkhi. A maximum entropy approach to identifying sentence boundaries. In Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, DC, USA, March–April 1997.

Online Version

http://www.aclweb.org/anthology-new/A/A97/A97-1004.pdf

Summary

In this paper author talks about problem of finding sentence boundary in the raw text.It uses the context information to identify whether the occurrence of '?', '.', '!'(or any other annotated sentence boundary) is a valid sentence boundary or not.The feature used were not domain specific which means that model can easily be trained for any other domain.

Method

First the candidate token is identified and then following features are used to classify whether this candidate is valid decision boundary or not

The paper talks about two system (Each using different set of features)

1.It takes advantage of the structure of the English language which makes it Domain specific.It uses the prefix and suffix of the candidate token. Some domain specific features such as whether it is honorific(Mr. Dr. etc) or