Difference between revisions of "Automatic Detection and Classification of Social Events"

From Cohen Courses
Jump to navigationJump to search
Line 7: Line 7:
  
 
== Introduction ==
 
== Introduction ==
This [[Category::paper]] aims at detecting and classifying social events. The approach makes use of [[UsesMethod::Tree Kernels]] and Sequence Kernels or MLN. The authors decided some rules for segmenting the citation into certain entities (title of the publication, authors and venue) and also identifying those entities, which were represented through an MLN. Three variants of solutions for inference were tried: an isolated inference for segmentation, a joint inference for segmentation, in which one inferred segmentation takes part in inferring another, and a joint inference of segmentation and recognition. The results were compared with existing baselines and were found to outdo baseline, although with a meager margin.
+
This [[Category::paper]] aims at detecting and classifying social events. The approach makes use of [[UsesMethod::Tree Kernels]] and Sequence Kernels. Two types of social events were targeted: Interaction Event (INR), ie an event in which both the agent and patient of the action are involved (e.g., inform,tell,meet,etc.), and Observation Event (OBS), ie an event in which only the agent is involved (e.g. see,run by,think of,watch,etc.). The OBS category is further categorized into three different types: Physical Proximity Event (PPR), ie event in which the agent and the patient are nearby (see,run by,etc.), Perception event (PCR), ie an event in which the agent and the theme are not in proximity (e.g. watch somebody on TV, read about somebody in a magazine, etc.), and Cognition Event (COG), which encompasses all the OBS events that are not of PRP or PCR type.<br>
 +
The authors solved this binary classification problem using tree kernels on varied structural representations of a sentence. The problem was solved in two-tiers: detecting the social-events in tier-one (OVA classification) and then doing a subsequent binary classification in tier-two. Same kernels and structures were used in both the tiers. As the annotated data with the authors was a lot skewed, they tried their experiments after sampling the data with a bunch of sampling methods, and observed that sampled data performed pretty well as compared to the baseline using unsampled data.
  
 
==Dataset Used==
 
==Dataset Used==
The dataset used were [[UsesDataset::Standard Citation Datasets]] Citesteer and Cora. These datasets contain a number of citations, with duplicate citations clustered together.
+
A part of [[UsesDataset::Automatic Content Extraction]] data was used. The authors got the data annotated with social-event-type information, along with the entities involved in the identified event.  
  
==MLN for Joint Citation Matching==
+
==Various Structural Representations Used==
The main predicate used in the MLN is <i>Token(t, i, c)</i>, which is true iff token t appears in the ith position of the cth citation. A token can be a word, date, number, etc. Punctuation marks are not treated as separate tokens; rather, the predicate <i>HasPunc(c, i)</i> is true iff a punctuation mark appears immediately after the ith position in the cth citation. Two "query" predicates are used (query predicates are the ones whose truth values are to be inferred):<br>
+
The following tree and sequence based structures were used to apply kernel machines to:<br>
<i>InField(i, f, c)</i>, which is true iff i-th position of c-th citation is a field f, where f <math>\Epsilon</math> {Title,Author,Venue}, and<br>
+
PET: This refers to the smallest common phrase structure tree that contains the two target entities.<br>
<i>SameCitation(c, c′)</i> which is true iff citations c and c' represent the same publication, and inferring this predicate performs entity resolution.
+
Dependency Words (DW) tree: This is the smallest common dependency tree that contains the two target entities.<br>
 +
Grammatical Relation (GR) tree: Obtained by replacing the words at the nodes by their relation to their corresponding parent in DW.<br>
 +
Grammatical Relation Word (GRW) tree: Obtained by adding the grammatical relations as separate nodes between a node and its parent.<br>
 +
Sequence Kernel of words (SK1): This is the sequence of words between the two entities.<br>
 +
Sequence in GRW tree (SqGRW): It is the sequence of nodes from one target to the other in the GRW tree.<br>
 +
A combination of these structures were also used. For example, PET_GRW would mean both the structures were used with a kernel that calculates similarity between forests.
  
====Isolated Segmentation Model====
+
==Tree Kernels Used==
The first model that the authors tried to solve the problem with, was to segment the citations without any kind of joint inference. Their segmentation model is essentially an HMM, where observation matrix and trnasition matrix are defined by certain logical formulas. For this model for identifying a segment, the observation matrix is defined by the logical formula:<br>
+
The authors used the Partial Tree (PT) kernel (Moschitti, 2006a), for structures derived from dependency trees and Subset Tree (SST) kernel (Collins and Duffy, 2002), for structures derived from phrase structure trees. PT is a relaxed version of the SST; SST measures the similarity between two PSTs by counting all subtrees common to the two PSTs. However, there is one constraint: all
<i>Token(+t, i, c) ⇒ InField(i, +f, c)</i><br>
+
daughter nodes of a node must be included. In PTs
The “+t, +f” notation signifies that the MLN contains an instance of this rule for each (token, field) pair. If this rule was learned in isolation, the weight of the (t,f)th instance would be log(p<math>{_tf}</math> /(1−p<math>{_tf}</math> )), where p<math>{_tf}</math> is the corresponding entry in the HMM observation matrix.
+
this constraint is removed. Therefore, in contrast
The transition matrix, on the other hand, is defined by the below logical formula:<br>
+
to SSTs, PT kernels compare many more substructures. They have been used successfully by (Moschitti, 2004) for the task of semantic role labeling.
<i>InField(i, +f, c) ⇒ InField(i + 1, +f',c')</i><br>
 
The inclusion of token boundary in the above formulas for finding the token in a filed is as below:<br>
 
<i>InField(i, +f, c) ∧ ¬HasPunc(c, i) ⇒ InField(i + 1, +f, c)</i><br>
 
In addition to the above rules, the following rules were also used: the first two positions of a citation are usually in the author field, and the middle one in the title; initials (e.g., “J.”) tend to appear in either the author or the venue field; positions preceding the last non-venue initial are usually not part of the title or venue; and positions after the first venue keyword (e.g.,
 
“Proceedings”, “Journal”) are usually not part of the author or title.
 
====Entity Resolution Model====
 
The Entity Resolution/Recognition model contains rules of the form: if two fields contain many common tokens, they are the same; if the fields of two citations match, the citations also match, and vice-versa; etc. Simply taking the output InField() predicates of the segmentation MLN as evidence to this MLN would constitute a standard pipeline model. Merging the two MLNs produces a joint model for segmentation and entity resolution.<br> However, the problem with this pipeline is that entity resolution often affects segmentation in a joint model. Since only a small fraction of citation pairs (c, c′) match, in the absence of strong evidence to the contrary the
 
MLN will conclude that SameCitation(c, c′) is false. If SameCitation(c, c′) is the consequent of a rule (or rule chain) with InField() in the antecedent, the MLN may infer that InField() is false, even if segmentation alone would correctly predict it to be true.<br>
 
Therefore, the authors defined additional rules that would not simply take InField as an antecedent rule.<br>
 
A rule <i>SimilarTitle(c, i, j, c′, i′, j′)</i> was defined, which is true if citations c and c′ contain similar title like strings at positions i to j and i′ to j′, respectively. A string is title-like if it does not contain punctuation and does not match the “title exclusion” as defined above in isolated segmentation model.
 
The authors held that if two citations have similar titles, yet different venues, they still represent the same citation. Hence, the rule:<br>
 
<i>SimilarTitle(c, i, j, c′, i′, j′) ∧ SimilarVenue(c, c′) ⇒ SameCitation(c, c′)<br>
 
 
 
====Joint Segmentation Model====
 
This model aims at joint segmentation, where citations are segmented collectively, as opposed to in isolation. A predicate<br>
 
<i>JointInferenceCandidate(c, i, c′)</i> was defined to be true if the trigram starting at position i in citation c also appears somewhere in citation c′, the trigrams do not match the “title exclusion” rules, and the trigram in c is not preceded by punctuation,
 
while in c′ it is. This is used to extend the segmentation model by adding the following condition:<br>
 
<i>InField(i, +f, c) ∧ ¬HasPunc(c, i)∧(¬∃c′JointInferenceCandidate(c, i, c′)) ⇒ InField(i + 1, +f, c)</i><br>
 
This rule says that a field inferred for position i is extended to the next position if that field is not a punctuation and does not satisfy a "title" (or JointInferenceCandidate) rule.<br>
 
However, if one uses this rule to do a segmentation for the pair of citations given below, the titles won't match, since the title token started with "On" will not be able to match with title of the second citation, when both citations are essentially the same.<br>
 
R. Schapire. On the strength of weak learnability. Proceedings of the 30th I.E.E.E. Symposium on the Foundations of Computer Science, 1989, pp. 28-33.<br>
 
Robert E. Schapire. 5(2) The strength of weak learnability. Machine Learning, 1990 197-227<br>
 
To take care of such cases, another rule was included, which could take advantage of the joint inference,ie, extend the segment to the next position if there's no matching citation for the given citation:<br>
 
<i>InField(i, +f, c) ∧ ¬HasPunc(c, i) ∧(¬∃c′ JointInferenceCandidate(c, i, c′)∧SameCitation(c, c′)) ⇒ InField(i + 1, +f, c)</i><br>
 
 
 
==Experiments and Results==
 
For CiteSteer data, four-fold cross-validation was done, while for Cora data, three-fold cross-validation was done. The results in F1 for identification of individual and all entities (title,author,venue) taken together for Citesteer and Cora datasets are as given in Table-1 and Table-2 below, respectively. "All" means all-citations, "Non-trivial" means citations that had at least one duplicate, "Potential" means citations with poor author-title boundary (e.g. with a punctuation missing after the author's last name and title's first word).
 
 
 
<b>Table-1</b><br>
 
[[File:poon_joint_inference_result_1.jpg]]<br>
 
<b>Table-2</b><br>
 
[[File:poon_joint_inference_result_2.jpg]]
 

Revision as of 08:05, 7 December 2011

Citation

Apoorv Agarwal, Owen Rambow, "Automatic Detection and Classification of Social Events", ACL 2010.

Online version

Click here to download

Introduction

This paper aims at detecting and classifying social events. The approach makes use of Tree Kernels and Sequence Kernels. Two types of social events were targeted: Interaction Event (INR), ie an event in which both the agent and patient of the action are involved (e.g., inform,tell,meet,etc.), and Observation Event (OBS), ie an event in which only the agent is involved (e.g. see,run by,think of,watch,etc.). The OBS category is further categorized into three different types: Physical Proximity Event (PPR), ie event in which the agent and the patient are nearby (see,run by,etc.), Perception event (PCR), ie an event in which the agent and the theme are not in proximity (e.g. watch somebody on TV, read about somebody in a magazine, etc.), and Cognition Event (COG), which encompasses all the OBS events that are not of PRP or PCR type.
The authors solved this binary classification problem using tree kernels on varied structural representations of a sentence. The problem was solved in two-tiers: detecting the social-events in tier-one (OVA classification) and then doing a subsequent binary classification in tier-two. Same kernels and structures were used in both the tiers. As the annotated data with the authors was a lot skewed, they tried their experiments after sampling the data with a bunch of sampling methods, and observed that sampled data performed pretty well as compared to the baseline using unsampled data.

Dataset Used

A part of Automatic Content Extraction data was used. The authors got the data annotated with social-event-type information, along with the entities involved in the identified event.

Various Structural Representations Used

The following tree and sequence based structures were used to apply kernel machines to:
PET: This refers to the smallest common phrase structure tree that contains the two target entities.
Dependency Words (DW) tree: This is the smallest common dependency tree that contains the two target entities.
Grammatical Relation (GR) tree: Obtained by replacing the words at the nodes by their relation to their corresponding parent in DW.
Grammatical Relation Word (GRW) tree: Obtained by adding the grammatical relations as separate nodes between a node and its parent.
Sequence Kernel of words (SK1): This is the sequence of words between the two entities.
Sequence in GRW tree (SqGRW): It is the sequence of nodes from one target to the other in the GRW tree.
A combination of these structures were also used. For example, PET_GRW would mean both the structures were used with a kernel that calculates similarity between forests.

Tree Kernels Used

The authors used the Partial Tree (PT) kernel (Moschitti, 2006a), for structures derived from dependency trees and Subset Tree (SST) kernel (Collins and Duffy, 2002), for structures derived from phrase structure trees. PT is a relaxed version of the SST; SST measures the similarity between two PSTs by counting all subtrees common to the two PSTs. However, there is one constraint: all daughter nodes of a node must be included. In PTs this constraint is removed. Therefore, in contrast to SSTs, PT kernels compare many more substructures. They have been used successfully by (Moschitti, 2004) for the task of semantic role labeling.