This project will examine the overall distribution of the data, adopting a ‘big data’ approach to identify hitherto unrecognised patterns and correlations. Working with structured data and related primary texts, the project will create a range of visualisation environments that allow interactive exploration. For qualitative data, simple measures of text length and data density will be visualised (building on the methodologies developed in the Criminal Intent project), and for structured data faceted browsing and automated clustering methodologies (K-means in the first instance) will be employed to create new ways of exploring the data – of creating what Katy Börner terms a ‘macroscope‘.

In the first instance, this visualisation environment will be used to identify the ideological and practical constraints which shaped the creation of the data itself. By comparing the information provided about individuals who appear in more than one dataset this project will allow us to critique the image of the individual coded in systematic structures of the record systems we work with. But, as importantly, this project will also allow us to explore how the transformation of these paper records into an electronic form has distorted or altered the meanings we derive from them. This project is an exercise in re-presenting the archive, and in re-assessing the factors that shaped both data collection in the 19th century, and historical writing in the 21st. This is worth doing not only as essential preparation for research on penal outcomes, but also as a way of interrogating how digitisation itself transforms the systems of the knowledge it purports to represent.