Tag Archives: Data Sets

Open Data and the Digital Panopticon

Of all historical periods and subjects, crime and justice in eighteenth- and nineteenth-century London is the most extensively digitised. Through the digitisation of countless numbers of court records, transportation registers, prison archives, trial reports, criminal biographies, last dying speeches and newspapers (amongst many other things), we can access a wealth of information about crime, policing and punishment in the metropolis, and about the fates of the offenders tried there, all at the click of a mouse.

To our great benefit, much of this data is openly available, a product of the dogged efforts of public bodies, academics, data developers, volunteers and enthusiasts; often (but certainly not always) supported by public funding. In the process it has opened up seemingly boundless possibilities for research.

Indeed, without several of these open datasets the Digital Panopticon could not be realised. In our efforts to trace the life courses and subsequent offending histories of London convicts transported to Australia or imprisoned in Britain in the late eighteenth and nineteenth centuries, we will be reliant on a number of open datasets such as the British Convict Transportation Registers and Female Prison Licences.

It seems timely, therefore, on Open Data Day, to celebrate these fantastic, freely-accessible resources, and to highlight just a couple of ways in which they will be useful to us on the Digital Panopticon. Taking place on 21 February 2015, Open Data Day will involve a series of events and gatherings which seek to develop support for, and to encourage, the adoption of open data policies by the world’s local, regional and national governments.

I have talked in a previous post about the ways in which visualisations of the openly-available British Convict Transportation Registers database can be used to put transportation under the ‘macroscope’ – to chart the complex patterns and interactions of penal transportation in their entirety, spanning the breadth of Australia and the length of a century, taking in the lives of tens of thousands of individuals along the way.

In this post I briefly want to highlight another open dataset which will be at the heart of the project – the prison licence records of females incarcerated in British jails in the nineteenth century, held by the National Archives (under the catalogue reference PCOM 4), the metadata for which is openly available on the Archive’s online catalogue.

The licences almost without exception record the age of the offender on conviction, a potentially useful piece of information for us on the Digital Panopticon in terms of record linkage. But, as with our other datasets, we want to know how accurately ages were recorded, and again in the case of the female licences by visualising the data it suggests some interesting things for us to think about.

Not least, it again reveals the tendency towards age heaping in the recording of ages at round numbers such as 20, 30 and 40, suggesting that recorded ages were regularly rounded up or down rather than representing the true age of the offender. If ages were recorded accurately, we would expect to see a smooth distribution of recorded ages. As seen in the graph below, however, this was far from the case in the recording of female prisoner ages in the nineteenth century, with spikes at the ages of 20, 30, 40 and 50, and dips at the ages 29, 31, 39, 41.

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Age on Conviction as Recorded in the PCOM4 Female Prison Licences

Does this mean, therefore, that we should disregard recorded ages as entirely inaccurate? Not necessarily – as the graph below demonstrates, when we compare the distribution of ages across different sets of records, it suggests that recorded ages were perhaps broadly reflective of age patterns. The distribution of offender ages is typically younger in the Old Bailey Proceedings (OBP) and in the Convict Indents (CIN – the records of those transported to Australia) compared to that of females imprisoned in Britain (PCOM4) – certainly what we would expect, given the nature of criminal justice policy at the time.

Ages of Female Offenders as Recorded across each Dataset

These are just a couple of ways in which the Digital Panopticon will be drawing upon the wealth of open data available to criminal justice historians. We are indebted to the hard work of all those who have contributed to the creation and dissemination of this embarrassment of riches which, in combination with the powerful digital technologies now at our fingertips, is opening up a whole new realm of research opportunities.


What’s in a Name?: Details and Data Linkage

A year in to the Digital Panopticon project we have begun record linkage with some of our key sources relating to Transportation. With several innovative iterations of initial linkage completed, thanks to Jamie McLaughlin, we have been able to trace more than three quarters of those sent for transportation from the Old Bailey, linking them to their voyage details in the British Transportation Registers. For some, we have also been able to link onwards to the Convict Indents compiled for them on board convict ships and once they arrived in Australia. This iterative process has taught us much about the nature of our different record sets, and about the complex job of connecting them together.

One of the biggest challenges in the linking process has been differentiating between the multiple cases of identical names and trials in the Old Bailey. However, with a schedule of record linkage due to connect not just our transportation datasets, but also imprisonment data and eventually civil data, such as the census and birth marriage and death information, in the coming months, the certainty of what to link and how becomes increasingly difficult.

When confronted with a sea of names, and no consistency in the recording of other contextual information between our diverse datasets, how are we to make the right choices and make sure that the correct history is connected to the right offender?

Between 1780, and 1900 there was only one Mary Ann Dring convicted at the Old Bailey she was sentenced to five years penal servitude in 1865 for feloniously uttering counterfeit coin. She had appeared in the old Bailey once previously in 1863 as a witness in the coining trial of another Woman, and twenty years later in 1885 might well have acted as a witness in a manslaughter case.

From a linkage perspective we are fortunate. In all of our criminal datasets there should only be one Old Bailey Mary Ann Dring. Indeed, this is very lucky because owing to just two lines of text for her own trial, the information we start off with in order to trace her is minimal:

Name: Mary Ann Dring

Approximate year of birth: 1817

Location: London.

Step one, is to link to the next big dataset for those who stayed in England to be imprisoned. In this case that is the PCOM 4 female licences for parole. By searching with the available information from Mary Ann Dring we took from the Old Bailey data, there is no problem in locating her licence. Those familiar with the licences will know that these documents give us the opportunity to, collect a vast amount more information on her. Confident that the right link has been made we can collect some key contextual detail that will allow us to identify Mary Ann Dring in further datasets.

Licence fields

The future datasets we link to will not, of course, contain the majority of this information. So we must utilise a few key details that will help us link to new records. For civil data we could certainly use information such as the fact that Mary Ann Drink was recorded as married with two children in 1865. She worked as a Charwoman, and had been resident in London, under her married name, since at least 1863 when she had her first conviction.

In the nearest census to Mary Ann’s Old Bailey conviction in 1865 (1861) there are 183 returns for a Mary Ann Dring born on or around 1817. If we make the not unreasonable assumption that our Mary Ann Dring was living in London for the five years prior to her Old Bailey appearance, we can rather luckily reduce that to four viable matches.  To most academic researchers or family historians, this is a small and manageable selection of information in which to choose.

MAD census entries

Yet even though we know she was married with two children, we are faced with four married women, two with two children, two with three, all living in London (and none with any occupation listed which is not unusual for a census entry with a male head of household). Given the parameters of most automated systems that might be required to make such a match, any of these census entries could be considered a valid match. Manually, it is possible for an individual researcher to reduce the choices to two viable matches. They are, from a linkage point of view, almost indistinguishable. The dates of birth for the two most likely candidates fall one year either side of 1817. Both are married, both have two children. Both are residents of London. Both have identical names.

In the 1871 census, six years from Mary Ann’s conviction and four years after her release from Prison, there are no records that would directly match to either of the entries for the 1861 census. Instead there is a choice of five women who all fall within five years of the original Mary Ann Dring’s birth year, but have notable differences in their personal information. Furthermore, depending on which links are made to census data, and what extra contextual information is added to May Ann’s case, there is the potential for relevant death records from London and the surrounding counties, spanning a fifteen year period.

The choices we would be faced with if we just looked for Mary Dring, without the middle name Ann would be several times the volume. If we looked for a Mary Smith with the same level of contextual detail we could well be faced with exploring hundreds of potential matches with no way to choose between them.

Each individual record linked to a convict has ramifications for future links. On the micro level this is the dilemma faced by every genealogist or family historian. The difficult decisions that have to be made in matching records to individuals. However, the Digital Panopticon’s task of linking almost 90,000 convicts across multiple datasets is not a micro history, nor a task that can be managed manually. The design of an automated system that can navigate and discern between multiple similar (or even identical) entries in a given dataset is essential. Or perhaps it is a question of ranking and displaying the multiple possible links in case of conflict?

It would seem that our challenge now is that of developing a suitably complex data linkage system, that can simultaneously maintain a high rate of matches that we can be confident in, and one that at the same time allow us to incorporate possible, contradictory, and conflicting data. Those with common names will no doubt prove our greatest challenge, but even someone as seemingly unique as Mary Ann Dring poses challenges about how we match, what we match, what we keep, and how to store and rank conflicting information across such a wide variety of datasets.