Generative Adversarial Networks and Cultural Heritage

The movie above shows thousands of permutations of two neural networks fighting each other – a Forger and a Detective, each pursuing opposite goals. We call these Generative Adversarial Networks, and I've been interested in seeing what they can do on cultural heritage material.

Generative Models are, broadly speaking, a form of Unsupervised Machine Learning. Given enough training data – "ground truth," or things as they are in the real world – the resulting trained network can then be sampled, generating new examples from the same distribution as the observed data.

To put this in concrete terms, we can take 22,000 photographs from early 20th-century Lund as ground truth. We ask one network, the Forger, to generate random images. Meanwhile, we'll ask our second network, the Detective, to look at a mixture of these random images and the "real" photos and try to determine determine which which is which.

The first try at both these tasks is pretty much guaranteed to be awful. The Forger will produce images that look indistinguishable from static, and the Detective will be likewise clueless about what's real and what's not. But we're not going to run this experiment once. We'll run it again and again, thousands of times per second – for days and days. Each time we do so, we'll tell the network what its adversary was up to. The Detective will learn how good it was at its job – which images it successfully identified as fake, and which it let slip through. Conversely, we'll tell the Forger which of its images passed muster, and which were immediately caught. We'll ask it learn from its mistakes, and intuit what will slip by the digital gatekeeper.

This can seem pretty amazing (and it is!) – but to the algorithms, early 20th-century photographs are really just a distribution of pixel values that humans happen to interpret as people's faces. There's no real understanding of human anatomy, which explains why you'll occasionally see three-eyed folks pop up in the illustration above. ("Two eyes are good," the algorithm may think, "so three must be even better!")

Pretty bad (early epoch)

Pretty good! (late epoch)

Too many eyes...

Still, it’s interesting to explore what neural networks can do with the enormous amount of digitized cultural heritage material now available online. In my work at the Yale DHLab we often use certain kinds of convolutional neural networks for image analysis, putting these algorithms to work for days and days on NVIDIA cards, executing hundreds of thousands of linear algebra equations every second, for days and days at a time. I like to think of Generative approaches as a chance for all this hardware and software to put down its work for a moment and dream.

Sample from a Generative Adversarial Network run on tens of thousands of portraits taken from 1860-1890.

Ethical considerations

The surfeit of high-quality, large-scale digital cultural heritage online can inspire an untold number of “data experiments”, all of which are dependent on the lived experiences of individuals and the artifacts they left behind. I’m certain that somewhere in most 19th-century photographic collections are photographs of caucasians in blackface makeup. How should we treat those pictures? Photographs of Native Americans from this same era are similarly problematic — I don’t think one can use such objects in good conscience for this kind of work without an enormous amount of prior research. Yet they exist alongside other photos in our digital libraries, most often without any kind of contextualizing information.

The best digital projects in this vein, such as The Real Face of White Australia, exist on the border between documentary history and artwork. What can we learn from how Kate Bagnall and Tim Sherratt treat their own difficult subject material — photographs of “people whose lives were monitored and restricted because of the colour of their skin”? The dramatic cascade of non-White faces highlights both these men’s and women’s lives as “invisible Australians”, as well as the the unequal power relationship between individuals and the state power which compelled the creation of such an archive. As digitization now enables not just the re-contextualization of images, but their wholesale creation, cultural heritage institutions and individual researchers alike might consider the way that generative models are dependent on the digitization of archival “excess”, but also capable of de-naturalizing such collections. As a carnival-mirror reflection of the visual archive, GANs ask us to consider who stares back from the screen.