Semantic Affinity of a linguistic-pattern is a measure for judging the effectiveness of that pattern in extracting the required Noun-Phrases belonging to a semantic class. A linguistic pattern for Information Extraction is a a frame (usually a verb-phrase) with certain role-filling slots. The aim of IE task is to find appropriate fillers (which are usually noun-phrases) for these role-filling slots. An example frame or pattern is: "<NP1> is the CEO of <NP2>", where <NP1> and <NP2> are role-slots to be filled in by appropriate entity extracted. For the sake of finding semantic affinity, the above frame could be split into two, so that each one has just one role-filling slot left, like "<NP> is the CEO", and "is the CEO of <NP>", to focus on one type of entity being extracted at a time.
The term was first used by Patwardhan and Riloff in their paper on learning domain-specific IE patterns from the web.
To use this metric for information extraction, a mapping is deﬁned between semantic class and the event roles relevant to the IE task. For example, in the terrorism domain, a role usually under consideration is the physical target of the attack. Most physical targets fall into one of the two general semantic categories: BUILDING or VEHICLE. Consequently, a mapping “Target → BUILDING, VEHICLE” could be defined. To find the semantic category of a noun-phrase, a dictionary is generally used. In the above example of "is a CEO of" frame, the semantic category for NP1 would be HUMAN, and for NP2, it would be ORGANIZATION.
Mathematically semantic affinity for a pattern is defined as:
where is the frequency of occurrence of the pattern where it had a noun-phrase from the semantic class "class", and is the total frequency of occurrence of that pattern in the corpus.
Semantic Affinity is a method.