Stylistic Structure in Historic Legal Text

From Cohen Courses
Jump to navigationJump to search

This will be the project page for Elijah Mayfield and William Y. Wang.


The Background

In this project, we are interested in understanding the stylistic differences of judges in historical legal opinions. We specifically focus on cases regarding slaves as property. Slaves remained the largest source of wealth until 1840s. Judicial preferences and styles could generate variations in the security of slaves.

We are interested in studying how these cases were handled in different regions of the United States with varying views towards slavery. Because this is a longitudinal data set, we are also interested in understanding how styles change over the course of decades.

To do this, we will utilize a comparable aligned corpus of judicial opinions and overviews on the same cases. Our belief is that by capturing the topical overlap between an opinion and a neutral overview, the non-content word structure of the judge's opinion that remain will be indicative of the style in which that information is being presented.

To measure this, we will utilize local structured prediction tasks to generate a feature representation of a text based on those stylistic cues. We will then compare that representation to a simpler, unigram or LDA-based feature space at a classification task (region identification) and a regression task (year identification). Our belief is that our stylistic model will be more accurate quantitatively (by measuring accuracy at these tasks) and more interesting qualitatively (by leveraging features other than topic-based cue words to make a classification).


The Dataset

We have collected a corpus of slave-related and property-related US supreme court legal opinions from Lexis Nexis. The dataset includes 6,014 slave-related state supreme court cases from 24 states, during the period of 1730 - 1866. It also includes 14,580 property-related cases from the same period. Most of the cases consist of the following data fields:Parties, Court, Date, Judge Opinion, Previous Court and Judges, Disposition, Case Overview, Procedural Posture, Outcome, Core Terms Generated by Lexis, Headnote, Counsel, and Judge(s).


The Theory

We focus on the issue of author engagement, an attempt to describe the extent to which an author aligns themselves with the content of what they are writing. Examples of low engagement may be signalled by distancing with modal phrases ("it may be the case that...") or by attribution to another source ("the defendant claims that..."). High engagement may be signalled by pronouncement ("Of course it's true that...") or explicit endorsement of a third-party claim ("The defendant has demonstrated that..."). On the other hand, speakers may make statements with no engagement (simply stating a fact), suggesting that they believe that fact will be taken for granted or is entirely obvious to any reader.

These levels of engagement with the facts of a case demonstrate alignment with certain facts or sides in a legal case. Our belief is that the way in which facts, entities, and events are referenced by a judge in an opinion will be influenced heavily by other factors surrounding the judgment, such as the location, time period, and outcome of the verdict. Therefore, if we can extract these behaviors in a systematic way, we can then use them as observed features in a generative model. Moreover, these features are likely to be more informative and interesting for social scientists than simpler n-gram features, even if they perform no better at classification, due to their more descriptive nature.


The Approach

Qualitative analysis of our data set immediately showed a major disparity between the two largest text fields in each case - Judge Opinion and Case Overview. The first, written by the judge in delivering a verdict, is littered with examples of author engagement, with markers for opinionated, convincing, judgmental, or attributed facts. This is only natural for a judgment that must collect myriad testimonies and sources of evidence into a single verdict. On the other hand, the Case Overview section of each case lacks author engagement entirely. Facts and testimonies are recorded impassively, with no attempt to persuade the reader - it is a simple summary.

Most intriguingly, these texts are about the same pieces of evidence, the same testimonies, the same series of events. This means that we have, in effect, fairly large pseudo-parallel corpora for engaged and disengaged authors. However, these texts are not the same length - on average, an overview is roughly 10% of the size of the judge's opinion. Therefore, it is not practical to attempt sentence-by-sentence alignment.


Evaluation

Our task is to build structured representations of text which are informative for describing the stylistic structure of a written text. To test whether we are, in fact, getting any signal from our structured representation, we will attempt a classification task and a regression task. The first will be to predict whether an opinion was written in a slave state, free state, or border state. The second will attempt to predict the year in which an opinion was written.

We can then measure these results both quantitatively (mean squared error (in years) for regression, and classification accuracy or kappa for classification) and qualitatively (by checking that the distribution of features in different categories is indeed informative). For this latter interpretation and analysis, we will be collaborating with a historian from Columbia University and an economist from American University, from whom we received access to this corpus.


Baseline

We will attempt two baselines. The first will be a bag-of-words representation of an opinion. The second will be based on LDA topic modeling, using default settings.

It is possible this model will perform well. However, we believe that if it does, it will be because of shallow features which are not informative for social scientists. By contrast, stylistic features which describe a deeper level of linguistic structure may still be interesting even if they perform slightly worse at the overall tasks.


Engagement Structure Extraction

A key aspect of our representation will be finding the features that surround content words, and describing them succinctly. We have three categories of spans of text in each sentence:

  • Content words - discussing entities or events in a case, or specific case numbers, etc. We may be able to identify these automatically or through straightforward TF/IDF measures.
  • Engagement words - words surrounding the content words in a section which show how that information is being construed by the author. These are what we are interested in.
  • Uninteresting words - words which are not contentful and do not relate to the author's positioning.

Our goal is to identify Engagement words in a sentence. This can be viewed as a superset of the hedge detection problem from the NLP literature. Our work will be largely unsupervised, but we will start with a seed word list from linguistic literature. We can then label those key terms as engagement indicators with high confidence in our training data.

We can also label content terms based on words that overlap, by some metric to be decided, with the overview text. Those texts do not have any of the stylistic indicators of engagement that we wish to annotate, so the words that will overlap most strongly are either from the uninteresting or content words category.

These steps give us a partially-labeled training corpus. We may explore bootstrapping approaches to get more labeled data, based on those seed words. We will also do a qualitative analysis of the output of this step, to ensure that there are no systematic mistakes in the way data is being labeled. In particular, the use of prior court decisions as citations is something we need to worry about, because that's something about this domain that is not considered by the sociolinguistic literature, which focuses on more general text and on classroom interactions in particular in many cases.

We then need to find patterns of text which we can extract into features. This can be done through a variety of different options, and no one has been settled on yet. We can treat this as a sequence tagging problem in the same sense as named entity recognition or hedge detection; we can use features based on dependency parses of our corpus; one option relevant to recent research in my advisor's group would be to adapt or enhance the stretchy pattern framework described in (Gianfortoni et al., 2011).

The resulting feature space, which will be stripped of content words and will represent the output of this engagement-based feature extraction process, will be passed to the next stage.

Judge-Year-Region Topic Model

The second stage of our research project is not just to define a feature space, but also to use it for classification in a more intentionally designed way than a simple linear model combining features. We propose an extension of Latent Dirichlet Allocation based on the Author-Topic model (Rosen-Zvi et al., 2004) which incorporates judge, year, and region rather than just author.

Slide1.png

One way in which this can be used is that each word is explicitly assigned a variable corresponding to the judge, region, and year of each individual word in a document. This is similar to the idea of heteroglossia, where some words are coming from other speakers (representing another's opinion), but it's not clear how closely the sociolinguistic insight matches the actual use of the model in a real setting.

It's not clear how this works when the year feature is numeric. I'm not sure whether this matches nicely with the expectations of LDA.

This idea isn't fully fleshed out yet.