Automatic Detection and Classification of Social Events
Apoorv Agarwal, Owen Rambow, "Automatic Detection and Classification of Social Events", ACL 2010.
This paper aims at detecting and classifying social events. The approach makes use of Tree Kernels. Two types of social events were targeted: Interaction Event (INR), ie an event in which both the agent and patient of the action are involved (e.g., inform,tell,meet,etc.), and Observation Event (OBS), ie an event in which only the agent is involved (e.g. see,run by,think of,watch,etc.). The OBS category is further categorized into three different types: Physical Proximity Event (PPR), ie event in which the agent and the patient are nearby (see,run by,etc.), Perception event (PCR), ie an event in which the agent and the theme are not in proximity (e.g. watch somebody on TV, read about somebody in a magazine, etc.), and Cognition Event (COG), which encompasses all the OBS events that are not of PRP or PCR type.
The authors solved this binary classification problem using tree kernels on varied structural representations of a sentence. The problem was solved in two-tiers: detecting the social-events in tier-one (OVA classification) and then doing a subsequent binary classification in tier-two. Same kernels and structures were used in both the tiers. As the annotated data with the authors was a lot skewed, they tried their experiments after sampling the data with a bunch of sampling methods, and observed that sampled data performed pretty well as compared to the baseline using unsampled data.
A part of Automatic Content Extraction data was used. The authors got the data annotated with social-event-type information, along with the entities involved in the identified event.
Various Structural Representations Used
The following tree and sequence based structures were used to apply kernel machines to:
PET: This refers to the smallest common phrase structure tree that contains the two target entities.
Dependency Words (DW) tree: This is the smallest common dependency tree that contains the two target entities.
Grammatical Relation (GR) tree: Obtained by replacing the words at the nodes by their relation to their corresponding parent in DW.
Grammatical Relation Word (GRW) tree: Obtained by adding the grammatical relations as separate nodes between a node and its parent.
Sequence Kernel of words (SK1): This is the sequence of words between the two entities.
Sequence in GRW tree (SqGRW): It is the sequence of nodes from one target to the other in the GRW tree.
A combination of these structures were also used. For example, PET_GRW would mean both the structures were used with a kernel that calculates similarity between forests.
Tree Kernels Used
The authors used the Partial Tree (PT) kernel (Moschitti, 2006), for structures derived from dependency trees (DT) and Subset Tree (SST) kernel (Collins and Duffy, 2002), for structures derived from phrase structure trees (PST). PT is a relaxed version of the SST; SST measures the similarity between two PSTs by counting all subtrees common to the two PSTs. However, there is one constraint: all daughter nodes of a node must be included. In PTs this constraint is removed. Therefore, in contrast to SSTs, PT kernels compare many more substructures. (Moschitti, 2004) used PTs for the task of semantic role labeling.
Sampling Methods Used
As the annotated data was skewed--lots of non-social-event instances, the authors tried a bunch of sampling methods: random under-sampling and random over-sampling (Kotsiantis et al., 2006; Japkowicz, 2000; Weiss and Provost, 2001), and also sampling method with certain synthetic examples generated for the minority class(Ha and Bunke, 1997), so as to contain the effect of underfitting and overfitting with the under- and over-sampler respectively. For generating synthetic examples from minority class for this third sampler, the authors used a transformation of the tree for COG event types, in which the second participant was brought two levels up the tree (they empirically observed that for COG cases the second participant was more likely to be embedded in a dense subtree, thus making the kernel consider a lot of subtrees).
Experiments and Results
Experiments were performed with all the structural representations first without the sampled data for baseline, and then with all the variations of sampled data. The results are as in the table below.
Alessandro Moschitti. 2004. A study on convolution kernels for shallow semantic parsing. In Proceedings of the 42nd Conference on Association for Computational Linguistic.
Alessandro Moschitti. 2006a. Efﬁcient convolution kernels for dependency and constituent syntactic trees. In Proceedings of the 17th European Conference on Machine Learning.
M. Collins and N. Duffy. 2002. Convolution kernels for natural language. In Advances in neural information processing systems.
Sotiris Kotsiantis, Dimitris Kanellopoulos, and Panayiotis Pintelas. 2006. Handling imbalanced datasets: A review. In GESTS International Transactions on Computer Science and Engineering.
T. M. Ha and H Bunke. 1997. Off-line, handwritten numerical recognition by perturbation method. In Pattern Analysis and Machine Intelligence.