Information Extraction to Predict Judgement

From Cohen Courses
Revision as of 12:13, 26 September 2011 by Manajs (talk | contribs)
Jump to navigationJump to search

Relevant Information Extraction from Court-room Hearings To Predict Judgement

The bigger idea is to analyze conversational speech transcripts of court-room hearings, and extract relevant information that impacts the decision of the hearings. A possible approach is to first identify the bases of making decision on a case from relevant law (e.g. objective of the crime, manner of the crime, etc.), and then do some supervised/semi-supervised learning to identify from the conversation, portions that relate to discussion regarding these bases, and to find whether that portion of the conversation tends to work in the favor of the accused or otherwise.

Team: Manaj Srivastava Mridul Gupta


Dataset The dataset that we will be working on is the hearing transcripts for "amnesty cases" heard before the Truth and Reconciliation Commission. The Truth and Reconciliation commission was set up by Amnesty International, an international human rights watchdog, to give the perpetrators of apartheid in post-independence era in South Africa, a chance of seeking amnesty for the human-rights violations they indulged in. The commission arranged for hearings of the perpetrators who applied for amnesty (henceforth, the applicants). The online repository for these transcripts can be accessed at http://www.justice.gov.za/trc/amntrans/index.htm.


Baseline The approach we plan to take for the task of judgement-prediction is multi-fold. First, we intend to classify the portions of the hearing dialog into relevant topics. These topics will be on the lines of the parameters laid down in the Promotion of National Unity and Reconciliation Act (henceforth, the TRC act), for deciding whether the amnesty should be granted or not. After this classification is done, a binary classification will done for the portions pertaining to each topic; the two classes being, "favorable for amnesty" and "unfavorable for amnesty". A confidence score can be calculated for these portions in each topic for their falling into the "favorable" or "unfavorable" class. Lastly based on these confidence scores, the final decision of the hearing can be predicted possibly using a regression model.

Some of the features we are planning to use for the classification task (both the topic-based and the binary classification) are lexico-syntactic features like keywords/key-phrases (word n-grams, skip n-grams), POS tags, dependency features, and dialog features such as number of utterances, no. of speakers, type of speakers (the applicant, the victim, advocate, the judge, etc.). We can use the simplest of these features, possibly, keywords/key-phrases, as the baseline.