K. Seymore et al, AAAI-99

From Cohen Courses
Jump to navigationJump to search

Citation

K. Seymore, A. McCallum, and R. Rosenfeld.Learning Hidden Markov Model structure for information extraction In Papers from the AAAI-99Workshop on Machine Learning for Information Extraction, pages 37-42, 1999

Online Version

[1]

Summary

In this Paper author explores the use of Hidden Markov Models for the Information tasks.Paper focuses on two tasks firstly how to learn the model from the Data Itself and it investigates the role of labeled and unlabeled Data in Model training.The paper also states that model which has multiple states per field outperforms the one with one state per field.The said model was then applied for extracting fields from Research Papers.

Method

The paper states that if we have HMM with multi state per class than it will perform way better ans alternatively we can also learn the structure of HMM from the data itself.Initially every word in the training data is treated as a state with transition to the neighboring state(word).There is one start state with transition to first word and end state with transition from end state.The paper proposes two merging techniques to merge the states of Method

1: Neighbor Merging - Merging two states if they are associated with same class and have transition link in Between

2: V Merging - Merging two state if they belong to same class and they transit to the same common state.

These merge are performed and the model structure is chosen which maximizes the probability of model given Data P(M|D) States are merged one by one until an optimal model is reached.

According to Bayes rule- P(M|D) = P(D|M)*P(M)

where P(D|M) can be calculated from data using Viterbi algorithm and P(M) can be chosen which gives more weight to shorter models. Once the model is selected its parameter can be learned and in case of unlabeled Data Baum Welch algorithm can be used to learn Model parameters.Paper talks about another useful source of information which is distantly labeled data which has been labeled for some other purpose but can be used for present scenario.

Experimentation

The goal was to extract relevant fields from the research paper headers.Corpus consisted of some labeled(L) as well as distantly labeled Data(D).

The experiment results were as follow

# states #Links Accuracy(L) Accuracy(L+D)
BaseLine 17 149 77.9 88.6
Multi Class 36 164 78.7 90.1
V Merged 155 402 77.7 89.1

Clearly the model with more state per class has better Accuracy as compared to one state per class model