NiLao LiuLiu YandongLiu project midterm report

From Cohen Courses
Revision as of 21:53, 3 November 2009 by Nlao (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
  • What dataset will you be using? What does it look like (e.g., how many entities are there, how many tokens, etc)? Looking over the data is always a good first step before you start working with it, what did you do to get acquainted with the data?

We will be using data available from Kok & Domingo (2007). One of the team menber have previously did some experiemnt with the data.

Kinship.

This dataset contains kinship relationships among members of the Alyawarra tribe from Central Australia (Denham, 1973). Predicates are of the form k(p, p0), where k is a kinship relation and p, p0 are persons. There are 26 kinship terms and 104 persons, for a total of 281,216 ground atoms, of which 10,686 are true.

Nations.

This dataset contains a set of relations among nations and their features (Rummel, 1999). It consists of binary and unary predicates. The binary predicates are of the form r(n, n0), where n, n0 are nations, and r is a relation between them (e.g., ExportsTo, GivesEconomicAidTo). The unary predicates are of the form f(n), where n is a nation and f is a feature (e.g., Communist, Monarchy). There are 14 nations, 56 relations and 111 features, for a total of 12,530 ground atoms, of which 2565 are true.

UMLS.

UMLS contains data from the Unified Medical Language System, a biomedical ontology (McCray, 2003). It consists of binary predicates of the form r(c, c0), where c and c0 are biomedical concepts (e.g., Antibiotic, Disease), and r is a relation between them (e.g., Treats, Diagnoses). There are 49 relations and 135 concepts, for a total of 893,025 ground atoms, of which 6529 are true.


  • Do you plan on looking at the same problem, or have you changed your plans?

Same problem defined by Kok & Domingo (2007).

  • If you plan on writing code, what have you written so far, in what languages, and what do you still need to do?

We have basic implementation for relational graphic model training, structure learning in Java 6.0. We need to write code for generating and evlauting candidate hidden variables (together with related features). We as setting up a SVN to help team menbers cooperate.

  • In you plan on using off-the-shelf code, what have you installed, what experiences have you had with it?

No off-the-shelf code is used, but we have the above basic implementations.

  • If you've run a baseline system on the data and gotten some results, what are they? are they consistent with what you expected?

We have 3 baseline result given by Kok & Domingo (2007), and a baseline result given by our system without hidden variable detection.