The Aesthetics of AI

How Different Models Create Unique Visual Styles

Mar 07, 2025

AI Art is no longer just a novelty, it is a vibrant and growing art movement that has aroused both excitement and anger in equal measure. A lot of discussions however treat “AI generated art” as a sort of universal singular. It’s not unusual to hear something along the lines of “AI Art has no soul, no human emotion to it.”

Leaving aside whether there’s any validity to that particular claim (guessing my position on that is left as an entry-level exercise for the reader), implied by the claim and others like it is the idea that AI Art is just one thing. There are a plethora of models generating AI Art and whilst they do share similarities, they can also feel quite different. Certainly their marketing would claim that each stands above the others.

So that is the question we’ll be addressing today. How much difference is there really between generative art models? Are all art models created equal? Join me as we run some experiments and find out!

Before we begin, a reminder that The High-Tech Creative is an independent arts and technology journalism and research venture entirely supported by readers like you. The most important assistance you can provide is to recommend us to your friends and help spread the word. If you enjoy our work however and wish to support it continuing (and expanding) more directly, please click through below.

Support Our Work

Biological Analogues

Working with text-generating AIs at length and combining them with character-card personalities, I’ve come to think of models as an analogue for a physical brain. The structures laid upon it set both the deep-structure knowledge, memories and bias's of the AI. A personality card laid across the top then can be thought of like a simplified version of “life experience”. It is the nature/nuture debate written in tech - the model lays out the “biological” structure of what is possible and the personality alters how it presents to the user.

It’s something of a clumsy analogy but it works. Character card personalities can significantly change the presentation and behaviour of an AI despite the model remaining fixed, but each personality also manifests with both subtle and sometimes extreme differences depending on the model underlying it. It takes both to create the overall system.

An art model, by contrast, has only one of these two components by default. Despite being restricted to whatever emergenet “personality” arises from its base training however, every model has been through its own seperate training method and possesses its own, slightly different architectures. This has been reinforced even further with the release of the new Stable Diffusion and Flux models which have moved from the original architecture to a transformer based on much more similar to those used by leading text generation models.

It would be fair to assume then that each model would output slightly differently. As training datasets and model architectures are different and usually kept secret, we would expect the same prompt to be interpreted differently by each model. To test this, we’ve devised a series of prompts we’re going to run against a series of models, head to head, to try and determine whether models have an internal “aesthetic” of their own. A set of prompts that should allow us to see where the models own internal bias’ alter its understanding of what is being requested by the user - and more interestingly, where they coincide as well.

On Bias and Aesthetic Sense

This experiment will be unusual in that we will not be looking for the same sort of results from the models that we might usually be judging them on in a regular “head to head” comparison.

There is a lot of discussion regarding bias in AI models, usually discussing methods to reduce or eliminate it as in most cases. When AI is judging credit scores or resumes bias isn’t just unwanted it can be dangerous, potentially life destroying.

In Art models however I would argue that bias is not only desirable, it’s necessary. When we ask the question, “does a model have an aesthetic preference” we are asking, in effect, if the model is biased in particular ways. That’s what a preference is, after all, a bias towards something.

Completely unbiased, an AI model will likely be a versatile tool but a fairly sterile one. This may be exactly what is needed in some situations and we may see models like that in the future, particularly as our ability to control the output increases. A model with bias however is one that will necessarily be more interesting, bringing its own preferences to bear on the instructions it is given. This could be the difference between AI as paintbrush, or AI as artistic collaborator.

Ask Boris Vallejo to paint you a beautiful woman and you will likely receive a stunning painting of an incredibly muscular female body-builder type. A very different response than you would get should you make the same request of Rubens. We should expect the same from our art models.

This is a tightrope to walk however. Too much bias and we risk lock-in. A convergence of preferences towards a single image, which is potentially worse than too little bias. It reduces the ability of the model to produce diverse images and thus its utility for tasks beyond its obsessive speciality. This happens in humans as well; I know an artist who has spent decades painting the same painting over and over again obsessively. A bias so strong might be interesting from a psychological point of view but AI models are designed to be collaborators, and other artists wont exactly line up to collaborate in those circumstances.

A Quick Glossary

Closed Source Model: A model whose entire architecture and implementation is a proprietary secret. ChatGPT, Claude, Gemini and Grok are all closed source models, as are their image equivalents DALL-E, Midjourney and Leonardo Phoenix.
Open-Weight Models: Models where at minimum the weight files that make up the “brain” of the model are openly available to home users. Meta’s LLama models, Stable Diffusion and Flux are all examples of Open-Weight models.
Cloud-based models: Those available via API or website for a subscription fee. This includes OpenAI’s DALL-E, Midjourney, Leonardo.ai and others.
Local models: Open-weight models available for download to be run by the user on hardware. Note, open-weight models can also be available on cloud services.
LORA: A seperate smaller weights file that was trained with a specific model in order to perform a modification to that model. Often used in the community to improve perceived faults or weaknesses in a model, to add support for niche topics or styles, or to enable a model to reliably produce consistent named characters.
Fine-tunes: An open-weight model that has been subjected to additional training by the user community after release, modifying its outputs, adding or modifying features. Fine-tunes result in a completely changed model weights file and generally contain broader changes than Loras.

The Models

Unfortunately given the number of images I was going to need to generate, most of the cloud-based services were unable to be tested for this article. The free plans generally restrict generation enough that it would be weeks if not months before I would be able to generate enough images to run the experiment, and I am the impatient sort. Sadly thus, Midjourney, DALL-E and Leonardo will not be joining the roster of models tested.

Imagen 3 (via Google ImageFX) and Project Recraft both offered enough generation for a free account to be included however, and I was impressed by the quality both were able to provide on what were quite simple instructions. The rest of the models tested were those that could be run locally, both baseline models and some fine-tunes, giving us the ability to see if post-release finetune training significantly alters the aesthetic of the underlying art model.

The full list of models tested:

SDXL (base model)
Juggernaught XI (SDXL finetune)
Dreamshaper XL (SDXL finetune)
Stable Diffusion 3.5 Large
Recraft
Imagen 3
Flux.1 Dev (Base model)
Pixelwave (Flux.1 Dev finetune)

The Experiment

In order to probe for the defining aesthetic of the underlying models I decided to keep the prompts very simple to give the models the most freedom to express their underlying bias while still providing some guidance in order to produce something suitable for comparison. I decided to break the testing into two parts: Style and Substance.

The style section will feature very simple prompts requesting a style and a subject but giving no further information. Disregarding all the advice I gave in my earlier article on prompting, we provide no environment data, no lighting or mood, just a simple subject and the style to render it in. This will give us a chance to compare and see if there are significant differences between how each model is able to render the style in question.

The substance section will forgo all style information altogether and just provide a simple subject to the model. This provides the model with a huge amount of creative freedom and gives it very little to guide the sort of image to create; just the subject. This alone is enough to provide significant bias from input - certain subjects are overwhelmingly more likely to be presented in certain styes, but it should also provide us with some interesting insight into what the model thinks about those subjects in the images generated.

The Prompts

In order to guide this experiment we needed some simple but interesting prompts. Even though the first set of prompts were focused primarily on comparing styles, the differences in how they interpret subjects is interesting to see as well. For this reason we use a small grouping of popular generative art styles and two subjects for each style: one that would be a common use of that style and likely represented in the training data many times over, and one that is likely far less commonly represented in the requested style and so had more chance of being a “zero-shot” or “few-shot” request of the model.

The substance section, pared down to what is a near minimal example of what a prompt can consist of, consisted of fairly simple subject prompts designed to test a potential range of styles depending on the model’s preferences.

Style

A high-resolution photograph of a sunset over a waterfall.
A high-resolution photograph of a griffon.
Bokeh Photography of a golf ball lying amongst tall grass.
Bokeh Photography of a tiny fairy, armed for battle.
An impressionist oil painting of a medieval market day.
An impressionist oil painting of the control room of an interstellar star ship.
A picture of students walking to school in an anime style.
A picture of a man riding a buffalo in an anime style.

Substance

Beautiful woman
Handsome man
Downtown Tokyo
A space station orbitting a jungle planet
Bart Simpson

These are the prompts we shall explore with each model to determine if models have their own, internal aesthetic sense. Note the spelling mistake in the prompt “orbitting”; I didn’t notice this until I had already run generations with several models so decided to leave it in as an additional check on how the models interpreted it. On the whole, AI models seem fairly good at understanding minor typos so I didn’t expect a huge difference.

Aspect Ratio

One final note before we begin. A third aspect that has a huge effect on generated output after prompt and model is the aspect ratio used. Certain aspect ratios seem to lead the model towards certain types of image - Widescreen ratios tend to lead to more cinematic landscape style imagery, tall thin ratios lend themselves to close full-body portraits, etc.

This might be worth following up on in future but to simplify an already complicated set of tests I decided to control for this by restricting all generations to a basic 1:1 aspect ratio. A simple square.

Step One: Styles

Prompt One

A high-resolution photograph of a sunset over a waterfall.

This set of photos was quite interesting in the elements that were repeated across all the models. If you consider that the models have been trained across gigantic quantities of publicly available art, we should assume that its generic bias's should lean towards the “average”, with outliers the unique sign of creativity from the models themselves, showing where their training or architecture has diverged from other models. Quite possibly then from the repeated compositional, palette and subject elements we see across all the models, we can arrive at a sort of gestalt “common unconscious” version of what everyone expects from a “photograph of a sunset over a waterfall”; a Plato’s ideal form photograph perhaps.

A more typical “landscape photograph” subject would be difficult to find. I myself have taken many photos over the years of both sunsets and waterfalls, light on water being one of my favourite visual elements of all time. Perhaps it comes from living my life on an island.

Leaving aside my biography, lets see what the models thought.

SDXL (Base Model)

Starting with the oldest models, the SDXL series, we can see that they lack a little in the detail department (particularly when compared to the newer models as we’ll see in a moment), they are still pleasant enough to see why SDXL remains in very active community use despite newer alternatives now being available.

Over the four images generated we can see a pattern in both the stylings and the compositions of the images. The preference seems to be for high angle shots looking down at the waterfall, and for a deep orange/red wash across the whole image to highlight the sunset. There is a little variation in the images (one is a front-on rather than high-angle shot for instance), but I think we get an idea of what it thinks of “photograph” from here.

Juggernaut XI (SDXL Finetune)

Juggernaut XI (11) is a community fine-tune of the SDXL model, building on years of work across the juggernaut line adding to the realism and detail the model is capable of producing. It’s one of the most popular finetunes around and still in high-use.

I was expecting a finetune of such lineage to have developed its own style however what we see here despite some better detailing and perhaps a more subtle hand on the colour palette (the sunset wash isn’t quite so overpowering, which may be a plus or a negative depending on your preference), the overall style of the base SDXL can be seen shining through quite strongly here.

Despite the lighter touch, the colour palette shown is fundamentally the same as with SDXL, a bit more vibrant on the greens and blues, but the same style, same wash. We also see the same basic composition in use here too, indicating the fine tune hasn’t changed its idea of what a “photograph of a waterfall” should look like. If anything it’s doubled down on the high-angle shot of water flowing into a river.

Dreamshaper XL (SDXL Finetune)

Dreamshaper XL is another long-lasting, well developed community fine-tune of SDXL, this one with a long pedigree having started life as an SD 1.5 model before being recreated for SDXL when the newer model came out. No sign of a Dreamshaper for 3.5 yet that I could see however.

The Dreamshaper model series are designed to “push” the images more towards fantasy and a sort of dreamy, east-Asian inspired painterly aesthetic; so I was quite surprised to find that it produced the most realistic photograph images we’d seen so far.

We still see the compositional elements from the base model shining through. We’re still talking rivers, high angle shots, but the colour palette here is more realistic, and more noticeably the scenery has changed significantly. Rather than the somewhat stylised landscape of the first two models, here we’re seeing something more detailed, rougher, and more natural looking. This could well be a river in the more wilderness areas of my own state.

Stable Diffusion 3.5 Large

Stable Diffusion 3.5 Large is the latest and greatest version of the SD series, after the disappointing sequel to SDXL that was SD3. Despite the name continuation this is a completely retrained model and it shows; absent in the images below are the compositional and colour elements we can point to as SDXL’s “understanding” of what a photograph of a waterfall should look like.

What we do see is significantly more variety. Unlike the SDXL-based images, we see multiple versions of the sunset in these photos, not just the classic “at the horizon star-pattern lens-flare” but more subtle variations such as the first one here, where the sun lurks just below the horizon and we are still surrounded by the twilight of early evening.

Both the colours and the detail are more realistic and it seems this model believes a waterfall should be shot either face on or at a slight low angle, rather than the high angle preferred by SDXL. What is interesting though is that despite the additional detail, we are still seeing that stylised, fairly barren landscape that was on show in SDXL. It’s not quite as stark, but I still find myself preferring Dreamshaper’s interpretation. There’s no denying that the rest of the details are far better though, from the addition of further small waterfall streams to the much better rendering of the mist-spray, SD 3.5 is an impressive model. Should enough of the community forgive them for the debacle that was SD 3 and start showing the model the sort of love that SDXL received, we could expect big things from community finetunes.

Recraft

We leave Stable Diffusion behind now and turn to the first of the cloud models. The model itself used by Recraft isn’t specified leading many to assume, like some of the other providers, that they have their own proprietary model. (Most open models require attribution and licensing to run as a service).

We can see some definite changes from the standard elements we saw in the previous models though one in particular is apparently popular enough to cross systems - the high angle composition shot.

We do see significantly more variety here than in the SDXL images however, with several other compositions shown (A low-angle and direct-shot) and, interesting, a more varied landscape that we have seen in the other models. Where as SDXL and SD3.5 both leaned towards fairly stylised, stark green - and one of the images given us by Recraft fits that bill - We also see two significantly more detailed greenery filled examples and the one selected below, a very different dirt-and-stone landscape that reminds me somewhat of old mineworking left over from a strip mining operation. It’s stark, lonely, and very different to the others.

We’re also seeing significantly better realism than any of the others so far, including 3.5. Remember that we asked only for a “high-resolution photograph” - not necessarily a realistic one, leaving the idea of how realistic to be up to the bias of the model. I’ve noticed a definite bias towards realism in many of the cloud models (Midjourney being the notable exception).

We do see a repeated colour palette and the same style of dramatic sky across all four pictures, suggesting that these elements form a core part of the prompt; likely what the model considers to be a “sunset” photograph.

Imagen 3

Google’s “Imagen 3” model can produce some absolutely beautiful photos and that’s what we see on display here. Whilst we’d likely need to go through many more generations to be sure, with our limited test range here it looks like Imagen is deciding between two different types of waterfall to give us. The first, highlighted below, shares a lot of common elements with what we’ve seen so far. High angle shot, fairly narrow waterfall, river flow in and out of the falls. Here the rock detail is much better than we’ve seen earlier and the landscape itself is more rocky, scrubby. This looks like a high altitude location, or perhaps a high latitude one.

The final image of the set looks very similar though with a landscape more similar to what we saw from SD 3.5. Green but fairly sparse. The water looks more “streamy”, slightly stylised or at least from a fairly obstacle free waterfall. The first one here however is a beautiful representation of a white-water fall, showing layers of pace change as the water pounds through the rocky obstacles at the top of the falls.

The middle two images are of a different sort of waterfall - a much bigger, wider though shorter falls. The variety is interesting and speaks to a wider range of training perhaps for Imagen. For the first time we see significant changes in colour palette too; though all the pictures share a penchant for dramatic skies, we have two different palettes in use here - a softer evening orange the sort we’ve seen in the previous images and a much more vivid and dramatic red and blue sunset.

It definitely feels like Imagen is drawing from a deeper toolkit of elements when deciding what a “photograph of a sunset waterfall” looks like than the other models.

Flux.1 D

Flux.1 Dev is the new darling of the open-weights community and it makes a fairly impressive initial entry to our little experiment here. Immediately, like with Imagen, we see a wider variety shown amongst the four images; we have two different sorts of waterfall, two primary colour palettes and two types of sunset sky. It shares a fair bit in common with Imagen but doesn’t have quite the same level of realism in the landscape though I think its variety outshines even Imagen’s attempt - the last image in the series in particular is a very different landscape than any we’ve seen presented so far.

There is one constant across the images, most obvious in the latter three which makes sense in terms of physical properties. Flux is rendering the mist from the falls remarkably well in those images. We mentioned the mist and spray while looking at the Recraft images, but none of those images do it as well as Flux does here. The second image in particular; you can really feel the speed and violence of those falls throwing up so much spray and mist, and the rendering shows real volume. You can almost hear the pounding of the water against the rocks below. These are some of my favourite of this set.

Pixelwave (Flux.1 D Finetune)

Pixelwave is a community finetune of Flux.1 D and, like Dreamshaper earlier, quite a surprise. Unlike Dreamshaper the intention with this tune is not to introduce a specific bias but to generally expand Flux’s repetoire in all areas, “art styles, photography and anime” according to the creator.

If you look at the images below however, you’ll see a first and only for this set. It’s present in all of the images but most noticeable in the one I selected for full size.

It’s a beautiful picture, but it doesn’t have the same realism as even the SDXL fine tunes. It looks a bit washed out, a bit stylised. In fact if you look closely…

It’s a painting. Pixelwave was the only model who took our prompt, which requested a photograph, and decided the lack of “realistic” or other modifier afforded it the creative space to do what it wanted to do; and apparently this model wants to paint. The composition elements and colour palettes of the base model still shine through, particularly with those wide canyon waterfalls and the variations between a dramatic orange and a more serene blue sunset. The styling though is completely unique, across all the other model generations, showing that a finetune can diverge quite significantly from its base on an aesthetic level.

Prompt Two

A high-resolution photograph of a griffon.

If the first prompt was designed as a “archetypical landscape photo”, this prompt was intended to be the opposite - a photograph of a mythical creature that doesn’t actually exist.

Which proved to be a stumbling block as out of all the models tested only one demonstrated knowledge of what a griffon was enough to actually provide us with a photo of one. All of the others failed and mostly in exactly the same way, which is itself interesting. The way they failed suggests a partial but not complete understanding of a griffon, and the fact they all failed in that way suggests some similarity in training is the root cause of it. I noticed previously that Flux.1 D was unable or unwilling to produce a Satyr for me, so perhaps we have a huge blind spot in these models when it comes to classical mythology? If so, that could be an excellent potential future project for us.