Philgoo project status report
From Cohen Courses
Jump to navigationJump to search- What dataset will you be using? What does it look like?
- I am using MUC-7 as in (Borthwick, 1998)
- The dataset is devided to training set, dryrun set and formal set.
- A set has multiple <DOC></DOC>s but I am not using features from position in document so I merged the whole.
- What did you do to get acquainted with the data?
- I built a parser for the training set and test set
- Do you plan on looking at the same problem, or have you changed your plans?
- I am looking for the same problem. However I am realizing getting comparable score to published models is not trivial.
- Deep analyzing model structure, training method and data character will require stable output. I may consider using off-the-shelf code later but not now.
- Other jobs than implementing and runnding classifiers such as morphological processing, feature selecting, gathering feature data etc are much bigger than expected.
- If you plan on writing code, what have you written so far, in what languages
- Preprocessor for MUC7 NE data: rubi
- Parser for MUC7 NE data: C++
- HMM with joint likelihood: C++
- HMM with conditional likelihood: C++ (in progress)
- What do you still need to do?
- Implement more features
- CRF with joint likelihood
- CRF with conditional likelihood
- If you've run a baseline system on the data and gotten some results, what are they?
- HMM with joint likelihood will function as the baseline.
- The results accuracy is near 30% so much to improve.
- Process on morphological level is required. CMU and CMUs should be the same organization which is not now.