Calculating Sameness: Identifying Image Reuse In Early Modern books

Our research is concerned with the dissemination and transformation of scientific knowledge across Europe. The basis of our investigations forms a corpus of, to date, 343 books that have been printed between 1472 and 1650. We assembled the corpus around a specific text: the Tractatus de Sphaera by Johannes de Sacrobosco. This 13th century treatise on cosmology describes the spheres of the universe according to the geocentric worldview. Up until the 17th century it has been repeatedly published as part of university textbooks. In these the treatise is included in original, commented or translated form, and accompanied by other texts that were seen as relevant for the study of cosmology from disciplines such as medicine, astronomy or mathematics. As many of these textbooks were part of the mandatory curriculum at European universities, we regard their contents as representative for the scientific knowledge that was being taught and seen as relevant at the time of publication of the books. We extract several markers from the individual books that form the material evidence of our research. In addition to bibliographic data such as publishers, printers, date and place of publication, etc., we identified for every book the content structure: which texts it contains and, if applicable, wether the texts are commented or translated versions of existing texts. In doing so we can not only identify how the content of the books changed and – by extension – how certain disciplines gained and lost in importance, but also which publishers might be responsible for certain changes. The books also contain various types of visuals: diagrams, illustrations, decorative elements, etc. In the same way as texts, these visuals can offer insights into the kind of knowledge that is being distributed. By identifying and analyzing recurring images, we can evaluate the 'success' of certain imagery. If we find the same images being used by different printers, for example, that might be telling of one printer being influenced by another, or even indicate a physical exchange of wooden printing blocks. In this paper we present our approach for analyzing the more than 16.000 illustrations that we have annotated in our corpus. We employ an image hashing algorithm for identifying recurring images and existing visualization tools for analyzing the results. As the algorithm we use is independent of the visual material and – unlike machine learning algorithms – does not need to be trained, it can readily be used on arbitrary image collections. As part of this paper we will offer the entire analysis and visualization workflow for others to reuse.
Florian Kräutli, Daan Lockhorst
Kräutli, F., Valleriani, M. & Lockhorst, D. (2019) Calculating Sameness: Identifying Image Reuse In Early Modern books. Digital Humanities 2019. Utrecht.