Diffusion Models for Danish Golden Age Æsthetics

In September 2025 I gave a talk at Aarhus University showcasing some early efforts to build a Low-Rank Adaptation (LoRA) for a diffusion model that reflects the style of Danish Golden Age paintings.

LoRAs are a way of teaching a base model new concepts, styles, characters or objects. As such they are a kind of “fine-tune” of a foundation model. In this case, we’re fine-tuning the Flux.dev diffusion model, an open-weight neural network that can be used to generate images from text prompts, similar to OpenAI’s closed DALL•E model.

Creating this low-rank adaptation means we don’t have to do the expensive and time-consuming work of training a text-to-image model from scratch. Instead, we’ll show the existing model a number of examples of “ground truth”: real Golden Age paintings, combined with descriptions that capture what’s show in each artwork. We’ll tell the model that one painting represents a cow standing in a field, and that another shows two women sitting in bourgeois living room. Cruicially, we won’t tell the model that the images represent any kind of oil painting or Danish art movement — because that commonality is what we want the model to learn across all our ground truth examples. By captioning everything that isn’t necessarily a part of “Golden Age æsthetics”, we get the model to learn what is.

Training Data

Here’s a look at some of the 50 images I used for training, together with the captions I created to describe each image:

A man complains about the price of pizza, standing at the counter of Pizza Hut.

Captioning Strategy

You’ll notice each caption is preceded by a “trigger” word, guld8lder. Trigger words used to be more important in the era of previous models such as SDXL; it’s unclear if they are as crucial today. But the idea is to fix the notion of the style to an arbitrary string of characters, ideally one that’s easy to remember but not an actual word or phrase. When we generate new images in the future with this LoRA, we’ll invoke the style by including the word guld8lder in our prompt.

Why the need for this obfuscation? The model has already learned the relationship between real words and phrases and their expression in pixel. If we used “golden age”, we might end up with objects made out of literal gold, or possibly with other English-language connotations of the phrase outside of Denmark. guld8lder is not in the existing list of words, so it makes a good proxy for the style we’re trying to capture.

Style vs. Content

Although the images above all read as traditional paintings, it’s worth pointing out that the Golden Age included artists who painted strikingly modern subjects. As Michael Prodger has written, “Eckersberg’s encouragement led to a startling variety of unusual subjects… a dead rat and a nondescript section of water-filled ditch; …. a tumorous knee…”. Although my initial dataset includes just 30 paintings, I’d like to expand this in the future to include these more prosaic subjects. The more diverse the dataset, the better the LoRA can abstract away from the actual subjects and find the right vectors to represent the style.

Outputs from the Golden Age LoRA

Let’s see how our model performs, after a few hours of training. First, we’ll generate an image on the left without the LoRA, to see what the model thinks of our prompt before any adaptation. Then, we’ll add the LoRA on the right, while keeping the random seed the same. This should let us visualize just the changes the LoRA is responsible for.

This looks like success to me. The basic poses and composition is the same, but the figures and setting have transformed from 2020 to 1840. Notably the pizza is still in the same place! Modern baseball caps have been replaced by 19th century hats, and even the hanging lamps look older. More broadly, there’s a slight de-contrasting, as the model learns that oil paintings didn’t have the same dynamic range as modern HDR smartphone images.

*Bernie Sanders drinking champagne topless in a hot tub, with an anteater.*

This also shows a progression towards the painterly, together with the removal of the shallow depth of field (or bokeh) present in the original image on the left. The model has learned that Golden Age paintings tend to have more things in focus than camera images do. (As an aside, I take no responsibility for the model’s concept of an anteater. It seems to have morphed from a seal to a kangaroo in the two pictures.)

*A maniac with his diabolical weather machine.*

This is perhaps the most dramatic difference I’ve seen when applying the Golden Age LoRA. The figure is in pretty much the same place, but everything around him has changed. If I had to guess, I’d say “machine” is responsible for the metal tools in the original man’s hands, whereas I don’t quite know what to make of the steampunk-chicken device on the second, Golden Age image.

A sloth drinking a latte in a cozy cafe.

I will admit there’s a slight tendency for animals to look taxidermied in the Golden Age model, but I’ve come to think of that as an acceptable aspect of 19th century visual culture. Notice again the reduction of shallow depth-of-field; this help us read the image as a painting.

Variable LoRA Strength

Although knowning when a LoRA is trained well enough is as much of an art as a science, we also have some control over how much the resulting model is applied to any given generation. Here’s a spectrum of images, on a continuum of “not very much” to “a whole lot”:

A yellow Minion drinking a latte in a busy cafe.

In this case, I’d say the LoRA is a bit under-baked: we need to apply it at a strength of 1.75 to push the image into the 19th century. Ideally, LoRAs should have a visible effect between 0.7 and 1.0. This suggests I should train the model for a while longer. (Or better, monitor the loss to see if it goes up or down and choose a saved epoch with the lowest loss possbile. Training doesn’t always progress towards a better result.)

Subject Matter Interference

I’ve also notice a difference in how much strength is needed to effect a Golden Age style, depending on what the subject matter is. Images that are closer to the 19th century need far less impact from the LoRA. This makes sense: if you prompt for a 19th century man wearing a stovepipe hat, nearly all the training data for the underlying model is already of paintings, and the Golden Age LoRA just has to nudge the pixels a little bit:

A 19th century man wearing a hat and smoking a pipe holds up a sign with the Bitcoin logo.

(Although unlabeled, this progression of images demonstrates the steady increase in the strength of the Golden Age LoRA from left to right.)
The first two images already look like oil paintings, although they also demonstrate features typical of the underlying Flux model, such as overly-ruddy features. The third and forth images get much closer to a Golden Age æsthetic, and the final picture shows a kind of “overbaking” effect, with a spurious vignette and the beginnings of the loss of facial features. The point is, it was easy to get to a Golden Age look — and also easy to overshoot it.

Taylor Swift taking a selfie in front of the Little Mermaid statue in Copenhagen

In contrast, this “selfie” test never quite achieved an oil painting look. I think this is because two of the concepts (the singer herself and the concept of a selfie) are only attested in modern DSLR/smartphone images, and so there’s a lot more inertia the LoRA has to overcome. My colleague pointed out that the women’s clothing does become less modern and more traditional from left to right, even if the style remains relatively photographic. We do get contrast compression, which is good, but I think we also see signs of distortion by the final seventh image, which is cranked to a very high strength of 2.0. This LoRA probably needs more training to be effective.

Some Criticisms

I think there are three obvious questions with this experiment:

Haven’t you just made a bad oil painting LoRA?
It’s true that there are far more effective LoRAs shared online, that can make nearly any subject into an oil painting. These are probably trained on far more images, with more diverse contents, and are thus more robust. However (based on my observations of the training strategies), they lump together all Western oil painting, from the Renaissance to Thomas Kinkade. This effort, as preliminary as it may be, is focused on one national movement. (And given Eckersberg’s influence on his students, one could almost think of it as the long tail of his style alone.)
Aren’t content and style intermingled?
This is an interesting philosophical question. Let’s say I trained a style LoRA on 1970s films — many folks have in fact done just that. What makes a given image “70s Cinema”? If I prompt for a woman, with no other characteristics, should she have feathered bangs? (Spoiler — she probably will, if the LoRA was trained on enough data). The same would hold for short, flapper haircuts in the 1920s. Of course, you should always prompt for something uncommon or unexpected, and see how the model handles that. But by default, if I apply this lora to a prompt such as ‘a man walking down the street’, I sort of expect both a painterly style, and a man in a 19th century suit and hat. I think this is sort of a feature as much as it is a bug.
Isn’t the base Flux model already trained on Danish art?
This is an important question — given the amount of digitization that Statens Museum for Kunst and Ny Carlsberg Glyptotek have done, it’s clear that any diffusion model trained recently knows a little bit about the Danish Golden Age. I haven’t ever tried prompting the base model for “Danish Golden Age painting of a person scrolling through TikTok on her iPhone”; that would be an interesting experiment. But not all paintings are captioned with their genre, and certainly very few are captioned in the way that most models expect. It’s common for an artwork to be titled “Heath field”, for example, rather than any more descriptive text.

By performing the acts of curation, captioning, training, and evaluation, we are asking ourselves, no less than the diffusion model, to think intently about what makes something representative of the Danish Golden Age. As ephemeral or amusing as the outputs are, they are also a chance to consider how we ourselves see as we try to make an electronic proxy come closer to our own perception.