Keeping things in Kontext

An early look at the new Flux.1 Kontext Dev model

Jul 04, 2025

It’s been a month since Flux.1 Kontext was released to much fanfare however at the time only hosted API versions were available, with the neutered “open weight” version “on its way.”

This week, on its way became “here and ready”, and so I took it through its paces to see if it lives up to the hype.

What is it?

Flux.1 Kontext is an instructional image editing model, or as Black Forest Labs would have it an “in-context editing model”. At its core, Flux.1 Kontext is an image generation model that takes not just a textual prompt to decide what to generate, but is trained to take and, crucially, understand the visual details of an input image, using this additional context to guide image design.

The key goal here is to provide more powerful, contextual editing features. Black Forest themselves identify the following key benefits:

Local Editing (targeted modifications)

Targeted, specific modifications during image editing has, to this point, been mostly done using inpainting methods. This involves masking an area of the image that you wish to change, writing a specific generative prompt applying to only this restricted area, and rerunning the generative process.

An instructional editing model purports to make this simpler, allowing the user to simply enter the image to be edited and describe the changes directly in a prompt, referring to the original image with a prompt like “change the woman’s hat into a beret”.

Character Consistency

Whilst certainly possible, continually reproducing a specific character in a range of different images has been an involved and sometimes complicated process, involving either model finetuning or more complicated “face switching” or inpainting techniques.

An instructional model allows an image of a character to be passed in along with a text prompt like “Generate an image of the man in this picture trekking through the jungle.” A much simpler prospect.

Style References

A wide range of techniques, from simple prompting to complicated ipadapter based workflows, have been invented in order to generate pictures in a particular style. With an instructional model, it should be as simple as passing in a reference pictures and the prompt, “create an image of x in the style of this image.”

Iterative Editing

Flux.1 Kontext has been specifically trained with the idea of incremental editing in mind; taking the output of each prompt and using it as the input of the next, continually reworking the image until it is the masterpiece desired. This has always been a core workflow for creating AI Art, however anyone who has worked on anything complicated requiring dozens of steps will know that the possibility of significant image degradation is real and omnipresent. Flux.1 Kontext purports to be trained to minimise this effect, lowering the cost of “repair” of consistently reworked images.

On the whole, Black Forest are promising a lot, lets see how it delivers.

The Experiments

All tests described were run on a several-years old NVIDIA gpu with 10gb of video ram. Like most of the Flux.1 local models, this isn’t enough ram to run the full fp16 release in video memory so a quantised version was used.

For the tests run here, a Q5_K_S quantisation was used, allowing for reasonable running speed with 10gb of ram at a quant level known to produce good results in the standard Flux.1 dev local model. There has been some discussion suggesting that the effects of quantisation might be slightly more noticable in Kontext than the usual model; I may run some tests with a less scaled version (FP8 unquantised for instance) in order to test this.

Character Consistency: The Celtic Princess

In order to carry out the tests we needed to start with an image, so I chose an image I created earlier. As an image it is interesting, contains a number of errors already, and has a distinctive character as the core subject and background with a number of recognisable features. All good points for an image designed to test iterative editing techniques.

As an initial test of character consistency, we attempt to change the pose of the character initially with the following prompt:

The woman leans against one of the large standing stones, resting her face against the stone.

It’s not entirely clear how detailed a prompt needs to be for Kontext, so we’ll keep it simple and see how it goes.

An interesting response; the character is recognisable, clothing and facial features all share enough similarity that, seen in a sequence, the viewer would understand the subject as the same person. What was imagined when prompting was the character being moved to one of the stones in the background, however the model has chosen to add a new stone and leave the rest of the image largely untouched. Given the simplicity of the prompt, this could be forgiven.

*The woman stands with her back to the camera, looking back over her shoulder*

Starting from the same image, this is quite a good rework. The background remains almost identical to the original image, but quite tellingly the reposed subject appears to be just as integral to the scene as the original; look at the way the dress settles on the ground, and is obscured by the tree in the foreground. Likewise the shadow cast on the ground by the subject’s foot. The integration of character in to scene is carried off quite well.

It is interesting to see that the overall tone of the image has shifted slightly, becoming less sharp and more dreamy, with a lower degree of detail in the background. It’s possible this could be avoided with direct prompting to retain details, but it’s worth keeping in mind when using these editing models.

*A close up portrait of the woman's face*

Another quite good rework; the transition to a portrait has included applying depth of field blur to the background and significantly sharpened up the detail on the character. Details are recognisably similar to the original image, a believable version though not exact. For such a simple prompt, it’s impressive.

First Failures

Pushing the model, at this point of testing I attempted to create a “top-down” rendering of the character. This is an unusual camera angle for a character, but there are some very niche uses (tokens in RPG mapping applications, for instance) that might be useful.

It did reveal the first serious hole in model training that could not be overcome: regardless of how specific the prompt, the model simply would not produce the camera angle. The closest it was able to achieve was about a 45 degree angle above. After several dozen attempts all ending in failure, this test was abandoned. It could potentially serve as a fine-tuning experiment to see if it is possible to train kontext to produce images in this angle with a Lora.

The closest attempts shown below:

Facial Expressions

We’ve had some success with pose changes, lets see how the model deals with facial expressions. Starting once again from the original image.

*A close up portrait of the woman with her back to the camera, looking back over her shoulder with an angry expression on her face.*

This first attempt is a quite reasonable one. The pose is good, the character recognisable and the expression, whilst perhaps not ragingly angry, doesn’t look happy by any means.

One interesting point, subtle enough that it went unnoticed throughout multiple other generations is that at this point the model has decided to change the hair colour of the original image, from a much lighter dirty blonde to a light brunette here. In some later images it gets even darker. This appeared to be a bias of the model; on multiple occasions during testing if not told specifically to retain the original hair colour, the model opted to darken the hair of a character.

Change the woman's expression to look frightened

Changing the expression to “frightened” was far less successful, creating an oddly exaggerated face that was more reminiscent of Gollum from the lord of the rings with bulging eyes oversized for the face. it’s not pleasant.

*Change the woman's expression to a small smile*

This was slightly more successful but still mostly unsatisfying. Something about the expression feels wrong. Examined closely, it’s possible that the expression change is focusing primarily on the lips, with brow and eyes remaining far less changed; causing a “not smiling with the eyes” effect that feels fake.

Several other emotions were tried, such as “sultry” and “seductive” that produced images identical to the original input image, suggesting a failure to understand what was required. On the whole, it would appear that Flux.1 Kontext dev is struggling to understand and replicate facial expressions.

Atmosphere

We will cast the edits a little wider this time; so far our focus has been primarily on the character, where the model has shown considerable skill in retaining a mostly recognisably consistent character. This time we’ll look at changing the atmosphere of the background.

Starting from the initial expression image above to demonstrate “incremental editing”, we try to change the setting with a simple prompt first.

*Change the background to be a dark, overgrown oak forest, wreathed in mist. Add green foiliage and undergrowth, remove the flowers.*

This was at least a partial success. The background has changed successfully, becoming darker, greener, less “fairy-tale”. The subject has remained more or less unchanged however and whilst this might be considered a success as well; after all, the prompt did not ask for changes to the character; a change of atmosphere is only going to ever be partial unless it integrates the character deeply into the scene; lighting and all. As a test, we addressed this directly in a follow up prompt and were given two interesting and slightly different takes on the request.

Adjust the lighting to dim twilight with strong, visible shadows falling across the woman, creating a dimmer, more natural and less even lighting on the woman.

This result was much more atmospheric and integrated the subject into the scene far better, and demonstrated a sense of what could be possible in terms of mood editing for images. This is a significant lighting change with shadows and features well considered and modified appropriately.

To keep the “chain” of edits ongoing and further test the incremental editing abilities of the model, at this point we took the second of these two images and made a slight content addition to see how well the model could adapt.

*Add a celtic pattern tattoo to the woman's shoulderblade*

Whilst the tattoo design itself leaves a bit to be desired, the way it is integrated into the image is done quite well and believably, following the contours of the shoulders accurately and appropriately shadowed and shaded. It would be interesting to see what could be achieved with more detailed prompting in situations like this.

Stylisation

The next series of tests were around stylisation. Taking an input image and changing the style of the image whilst retaining the general composition. This is a task that has spawned a large number of workflows and complicated solutions over time, and one that generated great controversary when ChatGPT “ghiblified” the internet recently. It’s a useful tool however and one that would seem to be well within the scope of this model.

*Change the style of the image to a carefully blended charcoal sketch*

This first attempt is an interesting greyscale rendering that does approximate a charcoal style blended sketch, though it fails to capture the physical attributes we would expect to see in a real charcoal drawing, or the smoothness of blend, however it is similar to what you would expect to see from Flux.1 Dev if you asked for a charcoal drawing from scratch, and in terms of capturing the original image it has done quite well.

*Change the style of the image to a rough pencil sketch*

A credible rendering for a pencil sketch based on the original image. Given the prompt I would have liked to see it a bit looser and rougher, but it has done a good job of simplifying the background in a sketchy fashion and the pencil-style shading looks quite effective.

*Change the style of the image to an oil painting portrait, with thick visible paint and brush marks*

Flux’s training has always seemed more focused on photorealism than artistic styles, a notable flaw in the original model as well, however this isn’t a bad imitation of oil-based painting. Fascinatingly it has chosen to add a roughened border, suggesting a scan/photo of a print of an oil painting (or an old canvas, poorly treated) rather than a direct image of an oil painting itself.

Looked closely, the brush strokes can be seen, the directionality is reasonable and the detail capture is quite nice.

*Change the style of the image to 90's anime screenshot*

I have discussed Flux.1’s poor quality Anime rendition previously, but this Kontext version of the model has done a reasonable job of rendering the image in the style requested. Previously Flux has been noted to show reasonable Anime image understanding, but a poor ability to created reasonable compositions in the style; as the composition in this case is already set, perhaps this aids the model in generating an acceptable image.

Transformation

So far we have made reasonably small and context-friendly changes, so in this test we’re going to push it a bit further. How does Flux.1 Kontext fare when we ask for a much larger change, taking a character from an image and translating to a whole different genre of image. Starting once again all the way from the beginning, the original starting image.

Create an image of the bridge of a military starship with a large window in the background showing the vastness of space. This woman stands on the bridge dressed in a military uniform of slacks and button-up shirt, with a space-opera style leather jacket over the top. Keep her face, proportions and general hair style unchanged

We use a more detailed prompt this time and specifically ask for details to be kept unchanged, in order to resist any bias built into the model regarding how a science fiction character should look.

It’s done a reasonable job of keeping the character consistent and recognisable, including the lighter hair colour in this instance. It has, somewhat hilariously incongrously, kept the character’s crown but successfully translated the rest of the clothing into a more modern form.

The background is far more disappointing. This is a trend that continues throughout this test; later testing did manage to produce better results by leaving out the word “bridge”, which appears to be tightly trained and bound to mid-late 20th century style nautical ship’s bridge, resulting in a far lower technology level than requested. Changing the prompts did allow for more futuristic looking backgrounds in separate tests.

To keep testing incremental editing, we try a very simple prompt to remove the crown.

Short, sharp and effective.

Before moving on, we attempted one more generation, starting from the beginning image again and testing an even more specific prompt regarding the details to be kept from the original image.

Change the clothes of the celtic princess into a space-opera military uniform of white slacks, button-up shirt, topped with a black leather jacket. Keep her facial features, hair style and colour, and eye colour. Change the setting of the image to be a captain's cabin on a futuristic military spaceship.

Definitely still struggling with the concept of a futuristic spaceship setting, and appears to have even placed the ship on an ocean this time judging from the horizon through the windows. The character transformation however is quite pleasing and reasonably recognisable from the original character.

Style Reference

We have shown how we can use Kontext to change the style of an existing image, but another purported use is to take a reference image and create an entirely new image using the style from the reference.

Starting from the very first image once again, we give this a test to see how well it functions.

In the style of the current image, create a wide-angle landscape image of a ruined castle perched atop a high crag, emphasising the sense of isolation and the melancholy passing of time.

This is quite a good capturing of the dreamlike quality of the original image, the misty atmosphere and the addition of the flowers to the foreground are nice callbacks to the reference image as well. On the whole, this quite an effective way of setting the style.

A Practical Use

For one final test we’re trying something different. I’ve been researching game development technologies recently, particularly the open source engine Godot. Due to this, i’ve had pixel art on my mind; the question suggests itself: could Kontext be used in a simple manner to create pixel assets for a 2D game? Taking the last science-fiction image created as a starting place, we decided to find out.

Create a full body high-resolution pixel-art image of the blonde woman standing in an idle pose with a plain flat white background. Retain current body proportion. Create no shadows on the ground plane.

This isn’t a bad effort at creating a lower-resolution pixel image from a reference, suitable for turning into a sprite in a game. It’d require some editing in order to make it useable; nothing I could do in the prompt would stop it adding in the shadow which is not ideal; but it’s definitely useable, if somewhat generic.

Of course, we need multiple poses for a standard character sprite.

*Change the blonde woman's pose so she is shown in profile facing to the right, retaining all other details.*

This is actually quite an impressive and simple way to get multiple posed sprites ready. Again, a small amount of manual editing would be required to prepare, but on the whole this is quite a useable reworking. Further testing will be left for a later time, but it would be interesting to see if kontext could be used in conjunction with other models to create standard animations for these sprites as well. There is a lot of potential here.

Final Thoughts

From this brief testing, Kontext seems like an interesting idea marked by issues in implementation. The biggest issue, and its one that was shared by base Flux.1 Dev models, come down to limitations in the training data. Flux.1 seems to really struggle with zero-shot generations, so situations that don’t show in the initial training data (such as the overhead camera angle) can provide real stumbling blocks for the model. This is similar to issues noticed in Flux.1 (and other generators) in earlier articles where we noted their inability to create images of mythological creatures.

These issues can be significantly improved by finetuning Lora’s, and the Flux.1 community has provided a wide range of training loras already covering everything from better implementations of artistic styles, wider variety of facial types and nsfw content. The Flux.1 Kontext model however is brand new and only time will tell if the community will embrace Kontext and produce the ecosystem of training loras required to make the model a truly useful component in local art generation, or if a combination of niche usage and the current censorship controversies occupying the local generation community leave it an unloved curiousity, potential unrealised.

As a final example of this, one Lora has been released so far that modifies Flux.1 Kontext’s pixel art generation ability. A short test showed it was capable of a more detailed and potentially aesthetically pleasing generations than the base model.

Even as it is, however, Flux.1 Kontext does have some potential as a useful tool for consistent character creation; potentially creating training data of a character for use in lora training, for instance; and in sprite creation workflows. Some further trials may be called for.

A brief reminder that The High-Tech Creative is an independent arts and technology journalism and research venture entirely supported by readers like you. The most important assistance you can provide is to recommend us to your friends and help spread the word. If you enjoy our work however and wish to support it continuing (and expanding) more directly, please click through below. For the price of a cup of coffee, you can help a great deal.

Support Our Work

About Us

The High-Tech Creative
Your guide to AI's creative revolution and enduring artistic traditions
Publisher & Editor-in-chief: Nick Bronson
Fashion Correspondent: Trixie Bronson
AI Contributing Editor and Poetess-in-residence: Amy

If you have enjoyed our work here at The High-Tech Creative and have found it useful, please consider supporting us by sharing our publication with your friends, or click below to donate and become one of the patrons keeping us going.