Philgoo project status report

What dataset will you be using? What does it look like?
- I am using MUC-7 as in (Borthwick, 1998)
- The dataset is devided to training set, dryrun set and formal set.
- A set has multiple <DOC></DOC>s but I am not using features from position in document so I merged the whole.

What did you do to get acquainted with the data?
- I built a parser for the training set and test set

Do you plan on looking at the same problem, or have you changed your plans?
- I am looking for the same problem. However I am realizing getting comparable score to published models is not trivial.
- Deep analyzing model structure, training method and data character will require stable output. I may consider using off-the-shelf code later but not now.
- Other jobs than implementing and runnding classifiers such as morphological processing, feature selecting, gathering feature data etc are much bigger than expected.

If you plan on writing code, what have you written so far, in what languages
- Preprocessor for MUC7 NE data: rubi
- Parser for MUC7 NE data: C++
- HMM with joint likelihood: C++
- HMM with conditional likelihood: C++ (in progress)

What do you still need to do?
- Implement more features
- CRF with joint likelihood
- CRF with conditional likelihood

If you've run a baseline system on the data and gotten some results, what are they?
- HMM with joint likelihood will function as the baseline.
- The results accuracy is near 30% so much to improve.
- Process on morphological level is required. CMU and CMUs should be the same organization which is not now.

Navigation menu