Nlao writeup of Borthwick 1998

From Cohen Courses
Revision as of 10:43, 21 September 2009 by Nlao (talk | contribs) (Created page with 'This is a review of reviewed paper::Borthwick_1998_exploiting_diverse_knowledge_sources_via_maximum_entropy_in_named_entity_recognition by reviewer::user:Nlao. Is Decisi…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This is a review of Borthwick_1998_exploiting_diverse_knowledge_sources_via_maximum_entropy_in_named_entity_recognition by user:Nlao.

Is Decision Tree dead?

The task presented here seems to be very descrete and non-linear: I feel very natural to write a bunch of rules to do the extraction task (e.g. "New York" is a CITY, "New York Times" is a NEWSPAPER). Vintage classifiers like decision tree seems to be very suitable expressing such rules. However, people still prefer log-linear models.

Log-linear models are well defined optimization problems and are good a parameter tuning. But that's it, they leave all the problem of knowledge discovery to human. We have to provide good features (which might requires greate domain knowledge and labour) in order to make it work well.

The problem with DT is mainly the lack of training process to help it find the optimal model. Its training is not a well defined optimization problem. I guess it is time to fix the defect of DT and revive it.

  • Ni, you might want to consider presenting this paper, if you're interested in this idea.


[minor points]

-- Does not seems to be very motivated to combine different systems. Would people do that in real applications?

-- Writing the feature selection rules needs human labour and can be suboptimal. Nowadays, we would use feature induction methods to automatically select features.