Professor Barry Godfrey, University of Liverpool
Dr Larissa Allwork, University of Sheffield
Dr Sharon Howard, University of Sheffield
The project was made possible by a large grant from the Arts and Humanities Research Council Digital Transformations Theme.
While most of the data included in the DP website already existed in some form, the project created a number of new datasets, including both full-text and summary data. Where possible these datasets are made available as open data under a Creative Commons licence.
However, other records were too large for this to be feasible. In these cases the approach depended on the nature of the original source. For printed sources, optical character recognition (OCR) was used, while handwritten sources were manually typed using the process known as double rekeying, in which two different typists type the text, the two versions are compared by computer and discrepancies resolved manually.
Two printed sources were digitised and transcribed using OCR: Middlesex House of Detention Calendars 1836-1889 and Metropolitan Police Register of Habitual Criminals 1881-1925. The Middlesex House of Detention Calendars are held by the London Metropolitan Archives, who digitised high-quality images for us. The Metropolitan Police Register of Habitual Criminals is held by The National Archives which supplied us with images. In both cases, the original records were generally in good condition and the text was clear. However, their complex tabular layouts make OCR more difficult and, in particular, they often lacked clear horizontal boundaries between rows of information, which could lead to errors. The overall accuracy of the transcription is at least 99%.
Selected documents from UK Licences for the Parole of Convicts 1853-1925 were rekeyed (in addition to the basic Prison Licence Registers metadata transcribed by the project). In addition, a number of Western Australia record series have been rekeyed and will be added to the website in the near future. The accuracy of rekeying is affected by the sometimes difficult handwriting, particularly in the case of the Western Australia records, and the overall accuracy is approximately 98-99%.
For effective record linkage data needs to be highly structured and standardised. The project received data in a variety of forms and formats, which required varying degrees of processing to be usable. Each dataset received was audited and documented in detail, paying particular attention to the presence (or lack) of key information for linkage (in particular: name, age or date of birth, gender, occupation, address, birth place, offence, date and place of trial, sentence), and whether it was already structured so as to facilitate linkage or it would require further processing and cleaning.
The records were linked using a mixture of automated scripts and manual work. The automatic linking was implemented using node.js. Any record sets we felt might contain the same individuals were first filtered to remove impossible combinations of date, criminal sentence, place of birth, age or similar. Then names with compatible attributes were compared using a variety of string comparison algorithms such as Levenshtein distance, DICE and Jaro-Winkler. We found that soundex techniques were not effective for us, probably because we were not usually dealing with material which had been transcribed from speech, but rather records which had been copied from other written sources. Most missed links were due to small clerical errors such as missing letters or slightly different spellings.
Manual linking was undertaken through a bespoke web interface. This was implemented using MySQL and Java servlets, and allowed our researchers to display records which conformed to certain criteria (for example, not being linked to a specific other dataset) and then search again for records which might provide a match.
The public Digital Panopticon website is published by the Digital Humanities Institute Sheffield. The website is implemented using Java servlets. The historical background and project guides were developed using Mediawiki. The search is implemented using elasticsearch. The data visualisations use d3.js. The HTML and CSS use bootstrap.
Some of the resources searched by Digital Panopticon are only accessible via subscription. While Digital Panopticon allows users to search these resources and examine summary results free of charge, with links to the transcriptions and images at the relevant websites, we cannot provide full access to these resources for non-subscribers. To obtain access, it will be necessary to sign up for an account at their websites.
This project would not have been possible without substantial amounts of data created by several public and commercial organisations which have worked with us during the course of the project. We are very grateful to these providers for sharing their data with us.
Findmypast provided several crucial datasets from their major collection of Crime and Punishment records from The National Archives and their Census Returns for England and Wales 1841-1911. They also provided access to their images of male UK Licences for the Parole of Convicts 1853-1925 to facilitate transcription.
Ancestry provided a number of UK and Australian convict datasets.
The National Archives UK supplied high-quality images of female UK Licences for the Parole of Convicts 1853-1925 and Metropolitan Police Register of Habitual Criminals 1881-1925, in addition to data obtained via their Discovery catalogue.
The Founders and Survivors projects at the University of Tasmania and the University of Melbourne provided the key Tasmanian data for VDL Founders and Survivors Convicts 1802-1853 and VDL Founders and Survivors Convict Biographies 1812-1853.
State Records Office of Western Australia suppplied images of several series of convict records of transcription, which will appear on this website in due course.
Free UK Genealogy provided transcriptions data from the Civil Registration index of births, marriages and deaths for England and Wales, 1837-1925, which will appear on this website in due course.
Professor Simon Devereaux of the University of Victoria provided a copy of his Capital Convictions at the Old Bailey 1760-1837 database.
We are grateful to the following for their help in bringing this project to completion:
For information on how to contact the project please see Contact Us.
This page was written by Sharon Howard and Jamie McLaughlin, with additional contributions by other members of the Digital Panopticon project team.