Wu and Weld ACL 2010
Wu, F. and Weld, D. S. 2010. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association For Computational Linguistics (Uppsala, Sweden, July 11 - 16, 2010). ACL Workshops. Association for Computational Linguistics, Morristown, NJ, 118-127.
This is a latest paper that addressed the Open Information Extraction problem. Authors proposed an extraction system, WOE. First training data was extracted from Wikipedia using KYLIN (Wu_and_Weld_CIKM_2007), and then it was processed to train an unlexicalized extractor as TEXTRUNNER Banko_et_al_IJCAI_2007. There are many similarities between WOE and the other two systems.
There are three components in the system:
- For each attribute-value pairs (relations), matcher heuristically look for a reference sentence in the article for it. DBpedia was used for the clean set of infobox.
- First option was to train a classifier to decide if the shortest dependency path between two NPs is a relation. Second option was to train a CRF as in TEXTRUNNER to tag if the words between two NPs are part of a relation.