Sgardine writesup Sha and Pereira

From Cohen Courses
Jump to navigationJump to search

This will be a review of Sha_2003_shallow_parsing_with_conditional_random_fields by User:sgardine.

Summary

CRFs make use of diverse features, are trained discriminatively, and can make local decisions based on the entire input sequences, thus addressing many flaws of previous generative models. Here they are used for chunking and outperform the previous best, more complex adhoc model. A Gaussian prior is used to prevent overfitting. When CRFs were introduced, they were trained with GIS, but here much faster convex convergence techniques are explored, including conjugate gradient and quasi-Newton method (L-BFGS); voted perceptron is also considered for comparison. CRFs were competitive with SVM and outperformed MEMM; they did not significantly outperform voted perceptrons. Training converged much faster when approximations to the Hessian were used as in preconditioned CG and especially L-BFGS; slowest of all was the original GIS algorithm.

Commentary

Good summary of CRFs

I proved to myself that the log-likelihood function is concave