Difference between revisions of "Reynar et al, A maximum entropy approach to identifying sentence boundaries. 1997"

Revision as of 19:42, 27 September 2011

Citation

Jeffrey C. Reynar and Adwait Ratnaparkhi. A maximum entropy approach to identifying sentence boundaries. In Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, DC, USA, March–April 1997.

Online Version

http://www.aclweb.org/anthology-new/A/A97/A97-1004.pdf

Summary

In this paper author talks about problem of finding sentence boundary in the raw text.It uses the context information to identify whether the occurrence of '?', '.', '!'(or any other annotated sentence boundary) is a valid sentence boundary or not.The feature used were not domain specific which means that model can easily be trained for any other domain.

Method

First the candidate token is identified and then following features are used to classify whether this candidate is valid decision boundary or not

The paper talks about two system (Each using different set of features)

1.It takes advantage of the structure of the English language which makes it Domain specific.It uses the prefix and suffix of the candidate token. Some domain specific features such as whether it is honorific(Mr. Dr. etc) or

@@ Line 11: / Line 11: @@
 == Summary ==
-In this [[Category::paper]] author
+In this [[Category::paper]] author talks about problem of finding sentence boundary in the raw text.It uses the context information to identify whether the occurrence of '?', '.', '!'(or any other annotated sentence boundary) is a valid sentence boundary or not.The feature used were not domain specific which means that model can easily be trained for any other domain.
+== Method ==
+First the candidate token is identified and then following features are used to classify whether this candidate is valid decision boundary or not
+The paper talks about two system (Each using different set of features)
+.It takes advantage of the structure of the English language which makes it Domain specific.It uses the prefix and suffix of the candidate token.
+Some domain specific features such as whether it is honorific(Mr. Dr. etc) or

Difference between revisions of "Reynar et al, A maximum entropy approach to identifying sentence boundaries. 1997"

Revision as of 19:42, 27 September 2011

Contents

Citation

Online Version

Summary

Method

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools