GNAT (Grounded NELL-like AKBC Toolkit) is a software platform
developed at CMU for building and extending knowledge bases (i.e.,
"automatic knowledge-base construction", or AKBC). It is loosely
based on the architecture used in NELL, with one main difference: the
seed instances for the ontology of concepts and relations are
"grounded", i.e., identified with objects in an external ontology.
The work included in GNAT has been generously funded by two
corporate sponsors — Google and Baidu USA LLC — and the National
Science Foundation (NSF) under awards CCF-1414030 and IIS-1250956.
The overall architecture for GNAT is based on a few key constructs:
- "Entities" have ids, may or may not link to external databases, and
usually have one or more "surface forms", which are strings. Entities
can be tagged as "temporary", which means that their true identity is
uncertain, but two non-temporary entities with distinct ids should
refer to distinct things in the real world. "Relation instances" are
encoded as pairs of entity ids. "Concepts" are sets of entities, and
"relations" are sets of relation instances.
- A GNAT ontology consists of a "nomenclature" i.e., a set of
concepts and relations; "seeds", i.e., entities known to belong to
certain concepts or relation instances known to belong to certain
relations; and "constraints". Constraints include conditions similar
to those used in NELL, i.e., that the domain of relation R is a subset
of concept C, or that a relation R is reflexive.
- Semi-supervised learning tasks are represented by a feature-based
encoding (for entities or relations) and also "couplings", which are
pairs of entities (or relation instances) that could be assumed to be
members of the same concepts. "Coupling nodes" are objects Z so that
if X and Y are both coupled to Z, then X and Y are assumed to be
coupled to each other.