Deep Learning for Heritage Visualisation
Deep Learning can be employed in a variety of cultural heritage visualisation and reconstruction tasks. It is part of a family of machine learning approaches based upon the idea of ‘learning’ data representations through kernel-based algorithms that encode high dimensional abstractions in ‘deep’ or embedded artificial neural networks (ANNs). There are many sophisticated stratagems to do this. As a research field it has undergone explosive growth in the last decade thanks to the massive parallelism of GPU/TPU hardware, and has established its broad applicability to a wide range of previously computationally-intractable problems in computer vision, natural language processing, audio generation/recognition, games and scientific image analysis, amongst others.
Deep Learning (DL) architectures take a variety of forms and are the object of much current research. Applications in the humanities and cultural heritage hold great potential and interest, given that they are frequently data-rich areas. However, because of the ‘black-box’ problem (i.e. degrees of indeterminacy and inscrutability in exactly what the network has ‘learned’) there remains much to be done in accounting for and analysing the semiosic processes undertaken by DL systems – if, indeed, they are susceptible to quantification and analysis in novel ways. To me this is a subject of great interest: an interplay between well-curated datasets, ‘well trained’ systems and knowledgeable human interlocution.
A this stage I’ve mainly explored ‘supervised’ learning (i.e. human-in-the-loop), rather than employing fully-automated systems. In an important way this enables some control over the exploration and constraints of the parameter space and maintains focus upon the output objectives: a kind of fitness function, reining in the more baroque elaborations of the neural ‘imagination.’
A simple example: Frank Hurley’s movie ‘Home of the Blizzard’ (1913) contains a number of panning shots across Cape Denison, Antarctica – these can be exported as individual frames and stitched together to create a single panoramic image:
The source material is low resolution (an .mp4 file) and black and white, but this can be modified in couple of steps.
1] Firstly, a convolutional neural net (CNN) trained upon learned deep priors can be utilised to apply synthetic colour
2] Superresolution applied via a CNN image model gives satisfactory resolution increase – in this instance superior to conventional nearest-neighbour interpolation:
30] Finally, this can (rather ironically) be seen in detail re-made into a panning shot – but without any of the camera artefacts and film-degeneration of the source material.
This is a remarkable image: a previously unseen colour panorama of Cape Denison in 1911, based upon some reasonable assumptions (both human and machine). I’ve added some remarks by Hurley to bring it to life again. It’s a robust proof-of-concept.
Having undertaken extensive site documentation at Cape Denison myself, it is compelling to compare the newly synthesised/retrieved near-natural state with the site 100+ years later.
Similarly, portrait photographs of AAE expeditioners can be synthetically colourised using Deep Learning, with quite satisfactory results:
The implications of this approach are very interesting: colourisation and super-resolution of synthetic panoramas and still footage are achievable. ANNs dedicated to situationally-focussed archival heritage restoration and re-imagination of AAD Historic Site materials can be created – with much greater specificity and fidelity than is realised in standard CNNs, given the substantial source materials available (both historic and contemporary). This approach is generalisable for similar heritage applications, where there are sufficient digitised source materials – potentially extending the purview of forensic media well into the early 19th century and the origins of photography.
Despite the current spatiotemporal (interframe) limitations of ANNs, they suggest interesting possibilities for moving image and photogrammetric source generation, given spatial constraints. This last is work-in-progress, considering that there has to be a generative process (GAN) suitable for mapping generated texture views to an actual three-dimensional model – in itself presenting a series of interesting and challenging training questions. How would one confirm that ANNs actually ‘map’ (i.e. internally abstract/predict) Cartesian space? Can they generalise spatial projections? The old aesthetic saw: to what degree can content be separated from form? State-of-the-art systems I am aware of (2018) seem to be very limited in spatiotemporal prediction in some instances and super-capable in others. Perhaps it’s an inferential/inverse problem that requires interaction with other approaches.
This converges (in my imagination) with DL approaches to 3D representations of AAE expeditioners and the fascinating possibilities of creating Digital Ghosts, trained upon source moving image and audio materials, in concert with a Deep Fake transfer approach, to attempt to bring this history and its protagonists to a new digital life, in their own words. This might have significant advantages over conventional approaches to character animation, given that motion data could be derived from historical film footage, capturing the characteristic physical movement of individuals from history and redeploy them within near-photorealistic simulated immersive environments.
Work-in-progress.