Difference between revisions of "Semantic Affinity"

From Cohen Courses
Jump to navigationJump to search
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
Semantic Affinity of a linguistic-pattern is a measure for judging the effectiveness of that pattern in extracting the required Noun-Phrases belonging to a semantic class. A linguistic pattern for Information Extraction is a a frame (usually a verb-phrase) with certain role-filling slots. The aim of IE task is to find appropriate fillers (which are usually noun-phrases) for these role-filling slots. An example frame or pattern is: <NP1> is the CEO of <NP2>, where <NP1> and <NP2> are role-slots to be filled in by appropriate entity extracted. For the sake of finding semantic affinity, the above frame could be split into two, so that each one has just one role-filling slot left, like <NP> is the ceo, and is the CEO of <NP>, to focus on one type of entity being extracted at a time.
+
Semantic Affinity of a linguistic-pattern is a measure for judging the effectiveness of that pattern in extracting the required Noun-Phrases belonging to a semantic class. A linguistic pattern for Information Extraction is a a frame (usually a verb-phrase) with certain role-filling slots. The aim of IE task is to find appropriate fillers (which are usually noun-phrases) for these role-filling slots. An example frame or pattern is: "<NP1> is the CEO of <NP2>", where <NP1> and <NP2> are role-slots to be filled in by appropriate entity extracted. For the sake of finding semantic affinity, the above frame could be split into two, so that each one has just one role-filling slot left, like "<NP> is the CEO", and "is the CEO of <NP>", to focus on one type of entity being extracted at a time.
 +
 
  
 
The term was first used by Patwardhan and Riloff in their [[Learning Domain-Specific Information Extraction Patterns from the Web|paper]] on learning domain-specific IE patterns from the web.
 
The term was first used by Patwardhan and Riloff in their [[Learning Domain-Specific Information Extraction Patterns from the Web|paper]] on learning domain-specific IE patterns from the web.
 +
  
 
To use this metric for information extraction, a mapping is defined between semantic class and the event roles relevant to the IE task. For example, in the terrorism domain, a role usually under consideration is the physical target of the attack. Most physical targets fall into one of the two general semantic categories: BUILDING or VEHICLE. Consequently, a mapping “Target → BUILDING, VEHICLE” could be defined. To find the semantic category of a noun-phrase, a dictionary is generally used. In the above example of "is a CEO of" frame, the semantic category for NP1 would be HUMAN, and for NP2, it would be ORGANIZATION.
 
To use this metric for information extraction, a mapping is defined between semantic class and the event roles relevant to the IE task. For example, in the terrorism domain, a role usually under consideration is the physical target of the attack. Most physical targets fall into one of the two general semantic categories: BUILDING or VEHICLE. Consequently, a mapping “Target → BUILDING, VEHICLE” could be defined. To find the semantic category of a noun-phrase, a dictionary is generally used. In the above example of "is a CEO of" frame, the semantic category for NP1 would be HUMAN, and for NP2, it would be ORGANIZATION.
 +
  
 
Mathematically semantic affinity for a pattern is defined as:<br>
 
Mathematically semantic affinity for a pattern is defined as:<br>
 +
  
 
  <math>affinity{_p}{_a}{_t}{_t}{_e}{_r}{_n}=f{_c}{_l}{_a}{_s}{_s}/f{_a}{_l}{_l} \cdot log{_2}f{_c}{_l}{_a}{_s}{_s}</math><br>
 
  <math>affinity{_p}{_a}{_t}{_t}{_e}{_r}{_n}=f{_c}{_l}{_a}{_s}{_s}/f{_a}{_l}{_l} \cdot log{_2}f{_c}{_l}{_a}{_s}{_s}</math><br>
 +
  
 
where <math>f{_c}{_l}{_a}{_s}{_s}</math> is the frequency of occurrence of the pattern where it had a noun-phrase from the semantic class "class", and <math>f{_a}{_l}{_l}</math> is the total frequency of occurrence of that pattern in the corpus.
 
where <math>f{_c}{_l}{_a}{_s}{_s}</math> is the frequency of occurrence of the pattern where it had a noun-phrase from the semantic class "class", and <math>f{_a}{_l}{_l}</math> is the total frequency of occurrence of that pattern in the corpus.
 +
 +
Semantic Affinity is a [[category::method]].

Latest revision as of 15:34, 13 October 2011

Semantic Affinity of a linguistic-pattern is a measure for judging the effectiveness of that pattern in extracting the required Noun-Phrases belonging to a semantic class. A linguistic pattern for Information Extraction is a a frame (usually a verb-phrase) with certain role-filling slots. The aim of IE task is to find appropriate fillers (which are usually noun-phrases) for these role-filling slots. An example frame or pattern is: "<NP1> is the CEO of <NP2>", where <NP1> and <NP2> are role-slots to be filled in by appropriate entity extracted. For the sake of finding semantic affinity, the above frame could be split into two, so that each one has just one role-filling slot left, like "<NP> is the CEO", and "is the CEO of <NP>", to focus on one type of entity being extracted at a time.


The term was first used by Patwardhan and Riloff in their paper on learning domain-specific IE patterns from the web.


To use this metric for information extraction, a mapping is defined between semantic class and the event roles relevant to the IE task. For example, in the terrorism domain, a role usually under consideration is the physical target of the attack. Most physical targets fall into one of the two general semantic categories: BUILDING or VEHICLE. Consequently, a mapping “Target → BUILDING, VEHICLE” could be defined. To find the semantic category of a noun-phrase, a dictionary is generally used. In the above example of "is a CEO of" frame, the semantic category for NP1 would be HUMAN, and for NP2, it would be ORGANIZATION.


Mathematically semantic affinity for a pattern is defined as:




where is the frequency of occurrence of the pattern where it had a noun-phrase from the semantic class "class", and is the total frequency of occurrence of that pattern in the corpus.

Semantic Affinity is a method.