Siddharth writeup of two stage CRF

From Cohen Courses
Jump to navigationJump to search

'A review of Krishnan_2006_an_effective_two_stage_model_for_exploiting_non_local_dependencies_in_named_entity_recognition by user:sgopal1

  • This paper proposes an alternative to skip chain CRF's. Instead of connecting nodes together and performing inference, they train a two stage CRF. The first CRF is a standard CRF and with some predefined features. The output of the first CRF is then used as input features to the second CRF. The describe a few methods to generate features, basically some knowledge from the output labeling of the first CRF such as the number of times a word has been assigned a particular label etc. The second CRF then learns the weight of these features and output the final prediction.
  • Criticism:
    • The method seems a little less principled.
    • What happens if the first CRF is not a good model ? If this happens, im not sure whether the second CRF would learn anything better than the first.
    • I dont see the connection between the statistics of the training data and the sufficient statistics of the exponential model in this two stage CRF.