Information Extraction to Predict Judgement

From Cohen Courses
Jump to navigationJump to search

Relevant Information Extraction from Court-room Hearings To Predict Judgement

The bigger idea is to analyze conversational speech transcripts of court-room hearings, and extract relevant information that impacts the decision of the hearings. A possible approach is to first identify the bases of making decision on a case from relevant law (e.g. objective of the crime, manner of the crime, etc.), and then do some supervised/semi-supervised learning to identify from the conversation, portions that relate to discussion regarding these bases, and to find whether that portion of the conversation tends to work in the favor of the accused or otherwise.

Team: Manaj Srivastava Mridul Gupta


The dataset that we will be working on is the hearing transcripts for "amnesty cases" heard before the Truth and Reconciliation Commission. The Truth and Reconciliation commission was set up by Amnesty International, an international human rights watchdog, to give the perpetrators of apartheid in post-independence era in South Africa, a chance of seeking amnesty for the human-rights violations they indulged in. The commission arranged for hearings of the perpetrators who applied for amnesty (henceforth, the applicants). The online repository for these transcripts can be accessed at

Unsupervised Approach

We intend to use an unsupervised approach as the baseline method. We plan to first do topic modeling of the conversations, so as to find the different topics latent in the conversation. The motivation for this is that the decision on the hearing depends on how the conversation on different topics takes place. We plan to subsequently use these topic distributions identified, to do binary clustering of the hearing conversations, into "favorable" and "unfavorable" classes.

Multiple techniques have been earlier applied for general topic-modeling, and topic-modeling in conversations. Of particular interest to us are the techniques of topic modeling that take domain-knowledge into consideration. One such approach is bayesian topic segmentation as described in Eisenstein and Barzilay,2008. A variant of LDA using Dirichlet "forest priors" Andrzejewski et. al. has also been used for incorporating domain knowledge into topic modeling. An attempt to include conversational features into conversational topic modeling is discussed in Joty et. al.. Here, the topic distribution is estimated using a combination of HMM and Gaussian mixtures. Another good paper is by Zhu et. al. that detects topic-transition by incorporating domain-knowledge with LDA. Although LDA based methods look more promising, we plan to start with Bayesian and HMM/mixture-model models first, in order to have the baseline.

For subsequent clustering, we plan to use some lexico-syntactic features as describe at the bottom of this page. The clustering problem can also be modeled as a graph cut problem. Although, we first intend to try with non-graphical clustering using simple distance metrics.

Semi-Supervised Approach

Here again the approach will be similar to the unsupervised variant, with an advantage of having some labeled data. We plan to use bootstrapping methods to start with some seed examples, and build on the training data by extracting more examples with similar feature values. First, we intend to classify the portions of the hearing dialog into relevant topics. These topics will be on the lines of the parameters laid down in the Promotion of National Unity and Reconciliation Act (henceforth, the TRC act,, for deciding whether the amnesty should be granted or not. After this classification is done, a binary classification will done for the portions pertaining to each topic; the two classes being, "favorable for amnesty" and "unfavorable for amnesty". A confidence score can be calculated for these portions in each topic for their falling into the "favorable" or "unfavorable" class. Lastly based on these confidence scores, the final decision of the hearing can be predicted possibly using a regression model.

Some of the features we are planning to use for the classification task (both the topic-based and the binary classification) are lexico-syntactic features like keywords/key-phrases (word n-grams, skip n-grams), POS tags, dependency features, and dialog features such as number of utterances, no. of speakers, type of speakers (the applicant, the victim, advocate, the judge, etc.). We can use the simplest of these features, possibly, keywords/key-phrases, as the baseline.