I hope you realise that this might be one of the most thorough stress-tests on this specific industry (digital art) that there are... specifically on model biases.
I am particularly intrigued by this:
"In Art models however I would argue that bias is not only desirable, it’s necessary. When we ask the question, “does a model have an aesthetic preference” we are asking, in effect, if the model is biased in particular ways. That’s what a preference is, after all, a bias towards something."
I wonder to what extent this also applies to other, non ADM use cases... I think legal professionals have a disproportionate "bias towards bias". I tested this on my own self when I did the RandomForest classifier experiments, it made me think a lot as to why we have the negative association towards "bias"- and to what extent "subjectivity" can be consider a bias.
As ever, brilliantly done Nicholas. I am terrible at art, so sorry that I cannot provide a critique on specific style and substance findings :D
Thanks Katalina, you're kind to say. It took a look longer to pull together than I expected, I wasn't 100% sure anyone would even read it :P
I actually did a fair bit more testing, there were quite a few more prompts that I ended up cutting out. I might release a deleted scenes sometime, but I think what was there was enough to get a pretty good feel for the "personality" of the models under test. It would be interesting to do it with significantly more sample data, but I was lucky to get as much as I did out of the two free cloud services I was able to use. If I wanted to push it further i'd either need to do it over months, or restrict the testing to only locally-run models which would have been less interesting overall.
I've been thinking about bias in models for a while and it was only as I was working through these tests that I came to the conclusion that I eventually did - bias is misunderstood.
Obviously there are cases when bias, particularly when it comes to accessibility, is a terrible thing. No-one is going to get hurt because SDXL only has one core belief about what constitutes "handsome", but that's likely to be a very different matter if the question is what a model considers to be "likely to commit crime". There's a good reason why there's a heavy push against allowing AI to be used in certain circumstances.
It occurred to me though that a "preference" of _any_ kind is a type of bias, and you can't have any sort of artistic sense, any sort of personal style or even really appreciation for anything creative if you have absolutely no preference at all. If we want interesting creative AI, we need interesting bias. Completely unbiased would be artistically sterile. A blank piece of a paper.
I'm thinking I may need to try and play with some lora training to see how difficult it is to introduce additional bias', or tame the ones that exist. The finetunes I tested there were ones that had been trained over months and years, works of sustained effort that is likely beyond the reach of an average artist. As hardware gets more capable though, training a personal lora is something that will be far more reasonable.
I think the same principle applies to creative writing text generation models - which helps explain why there are finetunes of Gemma 2 that out-write ChatGPT handily. Interesting bias. I have one text generation model that I was testing a little in some benchmarking I was doing that is quite literally insane. It got overcooked during training and produces creative writing that is... difficult to describe. It's definitely "creative" though.
I wonder if we might see a move towards multi-LLM systems. I can see a lot of situations where you'd want a model as unbiased as you can manage in order to best perform certain tasks, but I think that a real "personality" requires bias, so interfaces with the user are going to be more enjoyable when they have preferences.
It's an interesting thought. A lot of focus has been on making systems adapt to user preferences, but maybe that's the wrong approach. Maybe people would actually feel more comfortable with systems that had their own preferences separate from the user.
I hope you realise that this might be one of the most thorough stress-tests on this specific industry (digital art) that there are... specifically on model biases.
I am particularly intrigued by this:
"In Art models however I would argue that bias is not only desirable, it’s necessary. When we ask the question, “does a model have an aesthetic preference” we are asking, in effect, if the model is biased in particular ways. That’s what a preference is, after all, a bias towards something."
I wonder to what extent this also applies to other, non ADM use cases... I think legal professionals have a disproportionate "bias towards bias". I tested this on my own self when I did the RandomForest classifier experiments, it made me think a lot as to why we have the negative association towards "bias"- and to what extent "subjectivity" can be consider a bias.
As ever, brilliantly done Nicholas. I am terrible at art, so sorry that I cannot provide a critique on specific style and substance findings :D
Thanks Katalina, you're kind to say. It took a look longer to pull together than I expected, I wasn't 100% sure anyone would even read it :P
I actually did a fair bit more testing, there were quite a few more prompts that I ended up cutting out. I might release a deleted scenes sometime, but I think what was there was enough to get a pretty good feel for the "personality" of the models under test. It would be interesting to do it with significantly more sample data, but I was lucky to get as much as I did out of the two free cloud services I was able to use. If I wanted to push it further i'd either need to do it over months, or restrict the testing to only locally-run models which would have been less interesting overall.
I've been thinking about bias in models for a while and it was only as I was working through these tests that I came to the conclusion that I eventually did - bias is misunderstood.
Obviously there are cases when bias, particularly when it comes to accessibility, is a terrible thing. No-one is going to get hurt because SDXL only has one core belief about what constitutes "handsome", but that's likely to be a very different matter if the question is what a model considers to be "likely to commit crime". There's a good reason why there's a heavy push against allowing AI to be used in certain circumstances.
It occurred to me though that a "preference" of _any_ kind is a type of bias, and you can't have any sort of artistic sense, any sort of personal style or even really appreciation for anything creative if you have absolutely no preference at all. If we want interesting creative AI, we need interesting bias. Completely unbiased would be artistically sterile. A blank piece of a paper.
I'm thinking I may need to try and play with some lora training to see how difficult it is to introduce additional bias', or tame the ones that exist. The finetunes I tested there were ones that had been trained over months and years, works of sustained effort that is likely beyond the reach of an average artist. As hardware gets more capable though, training a personal lora is something that will be far more reasonable.
I think the same principle applies to creative writing text generation models - which helps explain why there are finetunes of Gemma 2 that out-write ChatGPT handily. Interesting bias. I have one text generation model that I was testing a little in some benchmarking I was doing that is quite literally insane. It got overcooked during training and produces creative writing that is... difficult to describe. It's definitely "creative" though.
I wonder if we might see a move towards multi-LLM systems. I can see a lot of situations where you'd want a model as unbiased as you can manage in order to best perform certain tasks, but I think that a real "personality" requires bias, so interfaces with the user are going to be more enjoyable when they have preferences.
It's an interesting thought. A lot of focus has been on making systems adapt to user preferences, but maybe that's the wrong approach. Maybe people would actually feel more comfortable with systems that had their own preferences separate from the user.