In the last month we have been covering a number of topics within the generative art topic and, as with any technical subject, there are an awful lot of specific terms, acronyms, processes and concepts that can often seem intimidating, arcane and confusing. We believe wholeheartedly in open access to this new technology for everyone and that language should be no barrier, so we welcome you to the High Tech Creative Generative Art Glossary.
Please contact us should you discover any errors in our definitions or if you have additional terms that you think should be added. We will continue to update this glossary as the field develops (and as our own research continues).
Before we begin, a reminder that The High-Tech Creative is an independent arts and technology journalism and research venture entirely supported by readers like you. The most important assistance you can provide is to recommend us to your friends and help spread the word. If you enjoy our work however and wish to support it continuing (and expanding) more directly, please click through below.
Fundamental Terms and Concepts
Aesthetic Alignment: The degree to which an image matches the intended or desired visual qualities and aesthetic preferences.
API (Application Programming Interface) Integration: Connecting generative models to other software applications through programming interfaces to enable automated workflows. Example: Connecting a plugin in a text editor to connect to a generative model in order to use the model within the editor.
Artificial Intelligence (AI): A broad field of computer science dedicated to creating intelligent computer systems; systems capable of reasoning, learning and acting autonomously in complex environments. Artificial Intelligence aims to develop machines that can perform tasks that typically require human-level intelligence and encompasses a wide range of techniques and approaches, including machine learning, deep learning, natural language processing, computer vision and robotics.
Artistic Coherence: The degree to which a generated image has a unified and consistent artistic style or vision.
Concept Art: Art intended to act as a visualisation of ideas or concepts. Often used for exploring art styles and ideas for creative projects (film, games etc).
Creative Direction: The process of guiding an artistic process towards specific aesthetic goals or artistic visions.
Deep Learning: A subfield of machine learning that uses neural networks with multiple layers (hence, "deep") to progressively extract higher-level features from raw input data. This allows the model to learn intricate patterns and representations, leading to significant advancements in tasks such as image recognition, natural language processing and speech recognition.
Digital Painting: Digitally created or generated art that mimics traditional painting techniques and outputs.
Generative Agents: AI computer systems that autonomously create artistic output such as images or video.
Generative Palette: The range of visual styles, colours and aesthetics that a particular generative model is capable of producing.
Inference: When a trained generative AI model is used to generated new outputs or make predictions based on input data.
Large Language Model (LLM): A type of deep-learning model with many (millions or billions) of parameters trained on massive datasets of text and code, enabling it to understand, generate and manipulate human language for a wide range of tasks such as text completion, translation, summarisation and interacting with people via natural language.
Machine Learning: A subfield of artificial intelligence that focuses on building computer systems that are able to learn from data without being explicitly programmed with logic or rules. Instead machine learning algorithms identify patterns in data, build models and use these models to make predictions or decisions on new unseen data. Key tasks in machine learning include classification, regression and clustering. One extremely active subfield of machine learning currently is the development and refinement of Large Language Models.
Model: In the context of machine learning, a model is a learned representation of patterns and relationships within data. It is the product of a training process where an algorithm analyses data and adjusts its internal parameters to capture underlying structures. This model, once trained, can be used to make predictions, classifications or generate new data based on previously unseen input. The form of a model can vary greatly, some examples include sets of preprogrammed rules, mathematical equations, decision trees or neural networks with learned weights.
Open Source: A term originally from software development, in the context of Machine Learning it refers to an AI model which is made available for download with not only its pre-trained weights but with all the information needed to recreate the model from scratch. This includes architectural documentation, information on the training processes used and the training datasets that were used to create the model. Fully open source models greatly encourage innovation and experimentation in research.
Open Weight: A machine learning paradigm where trained model parameters (the "weights") are released publicly, allowing users to download, inspect, use and modify or fine-tune the model for their own applications. Open Weight models provide greater accessibility and potential for customisation compared to closed, proprietary models, fostering research, development and community-drive innovation in AI. Many models commonly described as Open Source actually fall under the definition of Open Weight.
Photorealism: Non-photographic images that appear indistinguishable from real-life photographs.
Pipeline: The complete sequence of steps involved in an automated system, from initiating trigger to final output (if any).
Prompt: The input text or other data (such as an image for image editing models) provided by the user to instruct the model on what to generate, a process called conditioning. A well-crafted prompt can significantly influence the quality, style and content of a generative AI's output.
Style Consistency: The ability of a generative model to maintain a consistent artistic style across multiple generated images.
Tokens / Tokenization: The process of breaking down text or images into smaller units (tokens) that a model can process, such as words, sub-words, or image patches.
Visual Grammar: The underlying rules and principles that govern how visual elements are arranged and combined in an image to create an effective composition.
VRAM (Video Ram): The memory on a graphics card used to store and process the data needed to run generative models.
Community Terms
Iteration Gallery: A collection of intermediate images showing the development of a final image over multiple edits and processes.
Model Card: Documentation for a generative AI model detailing capabilities, limitations and ethical considerations.
Model Zoo: A collection or repository of pre-trained models ready for use.
Prompt Gallery: A collection of effective or interesting prompts, generally hosted for sharing.
Prompt Market: A platform for sharing or selling effective prompts.
AI Art Models
AuraFlow: A large, fully open-source flow-based text-to-image generation model developed by the Fal team and inspired by Stable Diffusion 3. Notable for its detailed and realistic texture generation and availability under the Apache 2.0 licence, promoting open research and development in the field of generative AI.
DALL-E (Family): A groundbreaking family of text-to-image generative models developed by OpenAI, beginning with the original DALL-E released in january 2021. The DALL-E models have been influential in popularising and advancing the field of AI art generation.
DALL-E 1.0: The original DALL-E model, released in January 2021, that demonstrated the ability to generate images from text prompts, showcasing a novel approach to computer generated imagery utilising natural language processing.
DALL-E 2.0: A significantly improved successor to DALL-E 1 released in April 2022 that offered higher resolution images with greater realism and prompt adherence. It utilised a diffusion model conditioned on CLIP embeddings, leading to both better prompt understanding and image quality.
DALL-E 3.0: The third and latest major iteration of DALL-E, released in September 2023, focused on even better prompt adherence enabling the generation of more detailed and accurate images from complex text descriptions. It was also integrated with ChatGPT to assist with prompt generation.
Flux.1 (Family): A family of generative AI models developed by Black Forest Labs, a company founded by former core researchers of Stable Diffusion. The initial release of the Flux.1 text-to-image models offered state-of-the-art performance in image detail, prompt adherence, style diversity and scene complexity. The open-weight versions of the Flux models have gone on to become the current favourite of the open-weight image generation community.
Flux.1 Pro: A commercial grade text-to-image generative AI model within the flux.1 family developed by Black Forest Labs. The flagship model of the company, the model prioritises high-quality image generation with good detail and prompt adherence, style diversity and the ability to handle complex scenes. Unlike other models in the family, the Pro model is not openly available and can only be accessed via API.
Flux.1 Dev: A text-to-image generative AI model within the Flux.1 family developed by Black Forest Labs. This version offers performance capabilities close to the "Pro" model but with its weights openly available to the community. Flux.1 has become the standard model of the open-weight image generation community.
Flux.1 Schnell: The fastest text-to-image generative AI model in the Flux.1 family. Schnell, meaning "fast" in German, indicates its optimisation for rapid image generation, being able to create images in as few as 4 steps similar to the Stable Diffusion "Turbo" models. Schnell is a distilled model from Flux.1 Pro and though details are not public, it is assumed that a technique similar to Adversarial Diffusion Distillation (ADD) has been used to create it, similar to the SD Turbo models.
Flux.1 Dev Fill: A model within the Flux.1 family specialised for inpainting and outpainting operations with weights openly available to the open-weight community.
Flux.1 Dev Depth: A specialised model derived from Flux.1 Dev that utilises depth information to guide image generation, allowing native use of depth-based controlnets to specify composition of images.
Flux.1 Dev Canny: A generative model based on Flux.1 Dev trained to use Canny edge maps to guide image generation, allowing native use of Canny-based Controlnets to provide precise conditioning of the image generation.
Flux.1 Dev Redux: A specialised model based on Flux.1 Dev designed for generating variations of existing images while preserving their key elements. The model allows users to provide both an image and a text prompt and then generates new images maintaining the core structure and content of the input image while incorporating changes guided by the text prompt. Like most Flux.1 family models, the weights of this model are open and available to the open-weight community.
Ideogram: A text-to-image generative AI model developed by Ideogram, Inc. Known for producing visually appealing images with a strong focus on accurate and legible text rendering within the generated visuals, a task many early text-to-image models were not able to replicate. The model has undergone several iterations, each providing improvements in realism, prompt understanding and speed. The model is not available for local use but underlies the "Ideogram" online generation platform.
Imagen: A text-to-image diffusion model developed by Google research and integrated into their AI platforms (such as Gemini and ImageFX). Notable for strong photorealism and deep understanding of language nuances, its architecture uses a large language model to encode text prompts which then conditions a cascaded diffusion model to generate high-resolution images. Not available as a model for local use, Imagen is integrated directly into Google AI services.
Midjourney: A proprietary text-to-image generative AI model developed by the independent research lab Midjourney, Inc. Known for its distinctive artistic style and strong aesthetic quality, the Midjourney model has become one of the most popular models for creative art generation. The Midjourney model has undergone several iterations and improvements but is not available for direct use in local systems. Access to the Midjourney model is available primarily through an online Discord bot or the Midjourney web interface.
Stable Diffusion (Family): A family of latent diffusion models specifically designed to generate detailed images from text descriptions. Known for its efficiency, low computational requirements and being open-weight.
Stable Diffusion v1.0: The initial public release of the latent diffusion model developed by Stability AI and collaborators. Released in August 2022, it marked a significant breakthrough in accessible and high-quality text-to-image generation. Compared to prior models it offered a balance of speed, low computational requirements and the ability to produce detailed and coherent images from text prompts. This led directly to the birth of the open-weight image generation community and was a huge step in democratising AI image synthesis, sparking worldwide creative exploration.
Stable Diffusion v1.4: A foundational text-to-image diffusion model released in August 2022.
Stable Diffusion v1.5: A widely adopted text-to-image model released in October 2022, building on its predecessor (v1.4) with enhanced performance and versatility. Until the release of Stable Diffusion XL, Sd1.5 was the standard model of the open-weight image generation community.
Stable Diffusion v2.0: A major update to the original model featuring improved image quality via a new text encoder and introducing capabilities like depth-to-image generation and higher resolution upscaling.
Stable Diffusion v2.1: A refined version of the 2.0 model with a less restrictive NSFW filter in its training data, leading to improved generation of human figures and better adherence to some user prompts.
Stable Diffusion XL v1.0: An advanced latent diffusion model boasting significantly improved image quality, higher resolution output, better prompt adherence and enhanced photorealism compared to previous versions in the stable diffusion family thanks to a larger UNet backbone and dual text encoders. Replaced Stable Diffusion 1.5 as the primary model of the open-weight image generation community until the release of Flux.1, however still sees significant use due to incredibly deep community support.
Stable Diffusion XL Turbo: A highly efficient and fast version of Stable Diffusion XL 1.0 that uses a novel Adversarial Diffusion Distillation (ADD) training method to enable high-quality image generation in as few as 1-4 steps, significantly reducing inference time required to generate images.
Stable Diffusion v3.0 Medium: A 2 billion-parameter Multimodal Diffusion Transformer (MMDiT) text-to-image model developed by Stability AI. It was designed for improved image quality, typography, complex prompt understanding and resource efficiency for running on consumer-grade hardware. Its release was controversial however with an overly restrictive licence far different from previous entries in the family and some very well publicised issues with prompt adherence caused, it was surmised, by overzealous "safety" training. Due to this, the 3-series of stable diffusion has not received either the acclaim or the popularity of its previous XL models.
Stable Diffusion v3.5 Medium: An improved version of Stable Diffusion 3 featuring a refined Multimodal Diffusion Transformer (MMDiT-X) architecture with approximately 2.5-2.6 billion parameters. This has further enhanced image quality, prompt adherence, typography rendering and multi-resolution generation capabilities. 3.5 has started to be seen as something of a redemption for the poorly received launch of 3.0 and has picked up some fans however lags behind Flux in popularity within the open-weight generative image community.
Stable Diffusion v3.5 Large: A more powerful variant in Stability AI's Stable Diffusion 3.5 series featuring approximately 8 billion parameters and the same MMDiT-X architecture as the medium version. Offers superior image quality and prompt adherence compared to the medium version and particularly excels at generating high-resolution images of up to 1 megapixel.
Stable Diffusion v3.5 Large Turbo: A fast inference variant of Stability AI's Stable Diffusion 3.5 Large model. Like its predecessor, SDXL Turbo, Adversarial Diffusion Distillation techniques were used to distil the knowledge from the standard SD3.5 Large model efficiently in only around 4 steps, as opposed to the 20-30 used by the teacher model. Whilst there may be some reduction in quality, it was optimised for speed without a major quality sacrifice from its teacher model.
Animation and Video
Frame Consistency: Maintaining coherent subjects and styles across video frames in a film or animation.
Frame Interpolation: Creating new frames between existing ones (sometimes called key frames) to generate a smooth animation leading from the first to the second.
FILM (Frame Interpolation for Large Motion): A technique used for interpolation between existing frames in an animation, particularly when there is significant movement between frames. It uses AI algorithms to analyse motion patterns and generate intermediate frames that logically connect distant poses and positions, reducing jerkiness and creating more fluid animation from fewer keyframes. Especially useful for AI-generated video or animations where movement is choppy.
Interpolation: Creating new frames between existing ones (sometimes called key frames) to generate a smooth animation leading from the first to the second.
Temporal Coherence: Consistency of motion and elements over time in animations or film.
Model Architecture
Attention Mechanisms: Neural network components that allow the model to focus on the most relevant parts of the input data when generating outputs. The core advance responsible for modern Large-Language models.
Cascaded Models: A generation pipeline where multiple models work in sequence, each one refining the output of the previous model.
Classifier: A model component that categorizes inputs or identifies specific features within the data.
CLIP (Contrastive Language-Image Pre-training): A neural network trained to understand the relationship between image and text, enabling it to match images to relevant text descriptions. Often used as a translation layer in a generative art model for text-to-image generation.
CNN (Convolutional Neural Network): A type of deep-learning neural network particularly well-suited for processing grid-like data such as images. CNN's are able to learn relevant features directly from raw data making them highly successful in computer vision tasks, including image classification, objection detection and image generation.
Controlnet: A neural network architecture that adds extra Conditioning to diffusion models, enabling precise control of the generated image's composition, spatial arrangement and specific features based on various inputs. The inputs can include edge maps (Canny), depth maps, human pose skeletons and others. Controlnet allows users to guide the AI to create images that adhere closely to their structural or stylistic requirements.
Diffusion Models: Generative AI models that are trained by adding noise to an image and then learning to reverse the process (denoising), progressively removing noise to recreate the original data or generate new samples.
Flow Based Models: Flow-based generative AI models learn to transform a simple, known probability distribution (like a standard Gaussian) into the complex distribution of real-world data (like images) using a series of invertible steps. This allows them to both generate realistic samples and calculate the probability of those samples, which is crucial for training. After training, these models can produce diverse and high-quality images.
Foundation Model: Large-scale pre-trained generative AI models that serve as a base for more specialised models. Capable of performing a wide range of tasks.
Text Encoder: A neural network component responsible for transforming a textual input (the prompt) into a numerical representation (latent space embedding) that captures the semantic meaning and details of the text prompt. When used in conjunction with generative text models the embedding is used as input to the model itself, however in generative art architectures, this latent embedding is used as conditioning to guide the image generation process by the diffusion model.
Latent Space: A abstract, compressed representation of data created by a neural network where similar features and characteristics are positioned close together to enable the model to manipulate and generate variations.
MMDiT (Multimodal Diffusion Transformer): A deep-learning model architecture that combines the strengths of diffusion models and transformer networks to process and generate data from multiple modalities (e.g., text and images) jointly. In the context of generative art, MMDiT architectures often employ separate pathways to encode different modalities which are then combined within the Transformer's attention mechanisms to guide the denoising process of the diffusion model, leading to improved understanding and generation based on the combined input.
Neural Network: A computational system inspired by the structure of the human brain. A neural network is composed of interconnected nodes or "neurons" organised in layers that process information by adjusting the strengths (weights) of the connections between them, enabling the network to learn complex patterns from data for tasks such as classification, prediction and generation.
NSFW (Not Safe For Work) Filters: A system or workflow component designed to detect and block the generation of content meeting filter parameters. Generally used to enforce content rules on public facing systems or locally for systems intended for use by students or other young people.
Transformers: A neural network architecture that uses "self-attention" mechanisms to weigh the importance of different parts of the input data as its processed, enabling it to process sequential data like text or images effectively. This is the core advancement that started the current LLM-powered AI cycle.
T2I-Adapter: A module that adds specific conditional controls to text-to-image models, enabling more precise control over generated images.
UNet: A U-shaped convolutional neural network (CNN) architecture widely used for image segmentation and as a component in AI image generation models. It combines high-resolution low-level features with upsampled high-level features allowing for detailed and accurate pixel-wise predictions. Its effectiveness with even limited data has made it a foundational architecture in a number of image processing tasks.
VAE (Variational Autoencoder): A type of autoencoder that learns a probabilistic latent space. This creates a smooth and continuous latent space and enables the generation of new, realistic data by sampling from this and passing it through the decoder. Commonly used in generative art models for encoding images into a generative-friendly latent space and decoding latent vectors back into images.
Weight Pruning: A technique used to reduce the size of a generative AI model by removing less important parameters. This results in a smaller model that requires less video memory for inferencing, however can reduce the quality of the model as well, sometimes significantly.
Model Training
ADD (Adversarial Diffusion Distilliation): A training method used to significantly increase the inference speed of large-scale diffusion models such as those used in text-to-image generation. ADD combines model distillation techniques with the adversarial techniques used in generative adversarial networks to train a "student" model capable of generating high-quality images in just 1-4 steps, much faster than the original teacher models, though with potentially lower quality outputs.
Autoencoder: A neural network that learns to compress data into a lower-dimensional representation (encoding) and reconstruct it from that representation (decoding). The goal is to minimise the reconstruction error, forcing the neural network to learn the most salient features of the data. Generally used for unsupervised learning as it has limited generative capabilities. A specific type of Autoencoder, called a Variational Autoencoder (VAE) uses probability distributions to address this problem and is used in generative art models for latent space encoding and decoding.
Clustering: In unsupervised learning, clustering is a technique used to group similar data points together into clusters based on their inherent characteristics or features. This attempts to discover natural groupings within the data without any prior knowledge of group labels. Data points within a cluster are more similar to each other than to those in other clusters. The goal of this form of unsupervised training is to develop a model capable of discovering inherent groupings in new, unseen data, in order to provide insight. Example applications include social network analysis, customer segmentation, search result grouping and anomaly detection.
Dataset Curation: Selecting, organising and annotating specific training data in order to train or finetune a model for specific functionality.
Distillation: The process of transferring knowledge from a larger, more complex "teacher" model to a smaller, more efficient "student" model.
Fine-Tuning: The process of further training a pre-trained AI model on a specific dataset in order to adapt or improve its capabilities to a particular task or domain.
GAN (Generative Adversarial Network): A machine learning framework where two neural networks, a generator and an authenticator, compete against each other. The generator creates data and the authenticator evaluates the data in competition with each other with the generator attempting to convince the authenticator its generated data is not AI-generated. Each is incentivised to outplay the other and the adversarial process drives improvement.
Learning Rate: A parameter that controls the step size taken during model training, influencing how quickly the model adapts to the training data.
Loss Function: A measure of how well a model performs during training, quantifying the difference between predicted and actual outputs. Used to adjust model weights during training to improve performance.
Model Merging: The process of combining the weights or parameters of multiple pre-trained generative AI models in order to create a new model with hybrid capabilities.
Perceptual Loss: A loss function that measures the difference between images based on how they are perceived by a pre-trained neural network, often leading to more visually pleasing results.
RLHF (Reinforcement Learning from Human Feedback): RLHF (Reinforcement Learning from Human Feedback) is a training method that uses human preferences to refine the outputs of a model, allowing it to better align with human values and expectations.
Supervised Learning: A type of machine learning where a model learns to map from input features to output labels based a dataset of labeled, or "annotated" examples. Each example in the training dataset consists of an input (or series of inputs) and its corresponding correct output (the annotation). The goal is to train a model capable of accurately predicting the output label for new, unseen input data. Common supervised learning tasks include classification and regression.
Textual Inversion: An AI image generation technique that allows training a generative model on new visual concepts using just a few reference images. The system creates a new "trigger word" or embedding that the can then be used in prompts, allowing users to incorporate their own subjects, styles or objects into prompts without retraining the entire model.
Training Data/Dataset: A collection of labeled examples used to train a machine learning model. Each example in a training dataset typically consists of input features and their correct output (the label, in supervised learning) or inherent structure (in unsupervised learning). The model learns patterns and relationships from this data in a training process which it is then able to apply to make predictions or decisions on new, unseen data. The quality and size of the training dataset can significantly impact the performance of the final trained model as well as the amount of time required to train it.
Training Epochs: An epoch is a complete pass through the entire training dataset during the model training process. A full training run of an AI model may take multiple epochs.
Transfer Learning: A machine learning technique where a model pre-trained for a task is reused as a starting point for a model on a second related task, to save time and resources in training.
Unsupervised Learning: A type of machine learning where an algorithm learns patterns and structures from training data without explicit output labels or annotations. The goal is to discover hidden relationships, groupings or dimensionality reductions within the data itself and develop a model capable of recognising and predicting these features. Common unsupervised learning tasks include clustering (grouping similar data points), dimensionality reduction (reducing the number of variables while preserving important information), and anomaly detection (identifying unusual data points).
Generative Art Workflows, Pipelines & Process
Dynamic Prompting: Using templates and randomness to dynamically generate random prompts for text-to-image generation.
Face Enhancement/Restoration: An image-to-image generative art workflow involving detecting and enhancing the details of faces in an image.
Hires Fix: An image-to-image technique used in generative art image generation to improve the detail and quality of high-resolution outputs, particularly after generating at high-resolution or upscaling.
I2I: See image-to-image
Img2Img: See image-to-image
Img2Prompt: See image-to-prompt
Image-to-Image: A type of generative art workflow that uses an existing image as all or part of the conditioning process of generating another image.
Image-to-Prompt: Essentially a reversal of the text-to-image process, a workflow for attempting to generate an appropriate generation prompt from an already existing image.
Inpainting: An image-to-image generative art workflow involving modifying or replacing a specific region in an image in order to add or remove details.
Negative Prompting: Specifying negative elements or characteristics to avoid in images created by generative AI.
Outpainting: An image-to-image generative art workflow involving extending an image beyond its original borders by increasing the canvas size to the desired size and using generative AI to create additional image content to fill the empty spaces, blending seamlessly with the original image.
Prompt Engineering: The art of creating effective text prompts to guide generative AI models in creating desired outputs.
Style Transfer: An image-to-image generative art workflow involving applying the artistic style of one image to the composition/content of another.
T2I: See text-to-image
Text-to-Image: A generative art workflow involving using text descriptions when conditioning image generation.
Txt2Img: See text-to-image
Upscaling: An image-to-image generative art workflow involving increasing the size of an image while maintaining or enhancing its quality.
Technical Terms and Concepts
Anomaly Detection: In machine learning, anomaly detection (also known as outlier detection) is the identification of rare or unusual data points, patterns or observations that deviate significantly from the majority of the data. These anomalies can be indicative of errors, fraud, faults, intrusions or novel events. The goal is to build models that can effectively distinguish between unusual instances and the normal behaviour of the data.
Artifacts: In digital imaging, artifacts are undesirable distortions or anomalies that can appear in images particularly after complicated processing (such as AI upscaling or algorithmic filtering),
Batch Size: The number of data samples processed simultaneously during training or inference. In generative art, can also refer to the number of images being generated at a time.
Canny: The Canny edge detection algorithm developed by John Canny in 1986. It is a computer vision technique that identifies significant edges in an image and outputs a binary edge map (white lines on a black background) representing these structural boundaries. Used in tasks such as object recognition and as a control signal in AI image generation tools such as Controlnets for precise structural conditioning.
CFG (Classifier-free Guidance) Scale: A parameter in diffusion models that controls how closely the generated image adheres to the provided conditioning versus how much creativity is shown in the final image.
Classification: In the field of Machine Learning, classification models are neural network models that are trained via a supervised learning task that involves assigning training data to predefined categories or classes based on their features. The goal is to build a model from labeled training data (where the correct classification is known) that is capable of predicting the correct classification for new, previously unseen data. Examples of practical applications for this include image recognition and email spam detection.
CodeFormer: A face restoration algorithm used to enhance and repair faces in AI-generated images by balancing fidelity to the original image with facial improvements, allowing users to adjust this balance to preserve unique characteristics while fixing distortions or artifacts. Unlike other face restoration algorithms, CodeFormer is designed to maintain the original image's style and identity while making its improvements.
Compel: A method or tool used for advanced prompt weighting and emphasis, allowing for more nuanced control over the influence of different parts of a prompt.
Concept Drift: A phenomenon where the statistical properties of the target variable (the object a model attempts to predict) change over time in unforeseen ways. This can occur due to changes in the underlying data distribution, shifts in user behaviour, evolving environmental conditions or other external factors. When this occurs, a machine learning model trained on historical data can become less accurate and reliable as the relationship between data features change. Detecting and adapting to concept drift is crucial for maintaining long-running machine learning systems.
Conditioning: The process of guiding the image generation process using additional input data such as text prompts (in text-to-image generation) or other images (in image-to-image generation).
Denoising: Removing noise or unwanted artifacts from images.
Denoise Strength: A parameter used in image-to-image generation that controls how much of the original image is preserved during the denoising process.
Depth Maps: Images where each pixel's value represents the distance of the corresponding point in the scene from the viewer. These depth maps encode the three-dimensional spatial structure of a scene, usually in shades of grey indicating varying depths. Depth maps can be generated using various techniques like stereo vision, LiDAR or estimated from single images using AI models. They serve as a powerful conditioning input for Controlnets in text-to-image synthesis allowing for precise control over the 3D spatial arrangement of a generated image.
Edge Maps: Binary images derived from the Canny edge detection algorithm, where white pixels represent detected edges and black pixels represent non-edge areas. These edge maps provide a structural outline of content in an image that can be used by Controlnets to provide conditioning for image generation AI models, allowing users to define specific structural layouts or object boundaries in a generated image.
Embedding: A dense, low-dimensional vector representation of an entity such as a word, document, image etc. The goal of embedding is to capture the semantic meaning, relationships or characteristics of these entities in a numerical format that can be easily processed by machine learning models. Similar entities (semantically) are typically mapped to vectors that are close to each other in latent space. Embeddings are essential for many machine learning tasks such as natural language processing and are widely used in generative AI.
Eta Value: A parameter used in some samplers, like DDIM, that controls the amount of noise injected during the sampling process.
Features: In the context of machine learning and data analysis, features are individual measurable properties or attributes of a data point that are used by a model to learn patterns and make predictions. They are the input variables that describe the characteristics of the data. For example, in predicting house prices, features might include the square footage, number of bedrooms, location and age of the house.
Few-shot Learning / Few-shot Generation: The ability of a model to learn new concepts or generate new images from only a very small number of training examples.
GFPGAN: A face restoration algorithm using in AI image generation to improve the quality of facial features. It targets and enhances faces in generated images by fixing distortions, adding realistic details and improving overall facial coherence. Often integrated as a post-processing option in image generation pipelines.
Guidance: A parameter that controls how closely a model follows the conditioning information it is given and how much creativity the model exercises. Higher values mean closer adherence to conditioning, lower values mean more creativity.
Human Pose Skeleton: A representation of the human body as a set of interconnected joints known as keypoints, typically depicted as points connected by lines forming a skeletal structure. These skeletons capture the posture of a person in an image or video with each keypoint corresponding to a specific anatomical landmark such as nose, shoulders, elbows etc. Human pose skeletons can be generated by computer vision techniques and the images created can serve as precise conditioning input for Controlnets in text-to-image synthesis, allowing for generation of images with specific human poses and body arrangements dictated by the skeleton.
Image Segmentation: Dividing an image into multiple parts. AI segmentation processes often attempt to segment by content (such as background, people or faces) in order to facilitate further targeted processing.
Mask-based Generation: Using a mask image to define places included and excluded from generation processes. Used extensively in workflows like inpainting in order to define the area in which a prompted change is to occur.
Multi-pass Generation: Generating a final image by running intermediate versions through several refinement iterations.
Noise: Random data that forms the foundation of the generative process in AI art. In diffusion models the AI can transform noise into coherent imagery by gradually removing this randomness according to its parameters and conditioning.
On-device Inference: Running an AI model (inferencing) locally on a device rather than in the cloud or a local network server and accessed via API Integration.
Parameter Sweeping: Systematically varying parameter values in order to explore the generation results caused by the changes. Often used with fixed seed values.
Prompt Layering: Structuring prompts in a hierarchical or layered manner to create complex and detailed image generation instructions.
Prompt Syntax: The specific formatting rules and conventions used to construct effective prompts that guide AI model outputs.
Prompt Template: Reusable prompt structures designed to produce consistent results.
Regional Prompting: Applying different text prompts (in text-to-image generation) to specific regions of an image, to exercise localised control over the generated content.
Regression: A regression model is a type of machine learning model that is trained to predict numerical output based on input features. Models of this kind are used for applications such as predicting house prices based on features like size and location or forecasting stock prices over time.
Sampler: A sampler is an algorithm or method used during the image generation process to iteratively refine a noisy input (often random noise) into a coherent and meaningful output image. Different samplers employ various mathematical techniques and heuristics to guide this denoising process, influencing the speed, quality and style of the final generated output. Choice of sampler is a crucial factor in the performance and behaviour of a diffusion model.
Sampling Method: An algorithm that determines how new data points are generated from the model's probability distribution.
Scheduler: An algorithm that controls the addition and removal of noise in diffusion models.
Seed Value: In computing, a seed value is a number that initialises a "pseudo-random" number generator. Though computer random number generators appear random to a user, they are generated by a deterministic algorithm which allows for reproducible result; reusing a seed value will always cause the same sequence of "random" numbers to be produced. In general and security software this is often set to an unpredictable number, such as by using the system timestamp at the time a generator is initialised, or generating entropy by measuring an external factor.
In Generative AI art models, using a fixed seed value allows the same image to be produced every time a pipeline is run. This allows experimentation by modifying other pipeline parameters (such as step size, cfg scale etc.) and viewing the alterations this changes make on the generated image.
Steps / Iterations: The number of cycles a diffusion model goes through during the denoising process to create an image.
Tiling: Creating seamless patterns that can be repeated without visible edges.
Zero-shot Generation: When a generative AI model creates data (such as images) of a type (objects, scenes etc) without having trained on any specific examples of that type. Example, a model attempting to produce an image of a Griffon, having never been trained on the concept of Griffon.
Tools, Interfaces and Platforms
AnimateDiff: A specialised AI model extension that adds motion to static diffusion-based image generation. AnimateDiff works by preserving the image generation capabilities of base models like Stable Diffusion while adding the ability to produce coherent frame sequences with consist subjects and motion.
Automatic1111: A popular open-source web-interface for making use of generative AI art models. It provides a comprehensive user interface with extensive features for Text-to-image, Image-to-image and inpainting techniques, along with support for custom models, extensions and a variety of sampling methods. Along with ComfyUI, has become one of the most widely used tools for accessing and utilising open-weight AI image generation models.
Civitai: An online platform for sharing, discovering and training image generation AI models, finetunes and lora's. One of the key hubs of the open-weight image generation community.
CLIP Interrogator: A tool that analyses images to generate text descriptions or prompts that might recreate them in an AI image generator. It utilises OpenAI's CLIP model to extract visual concepts and match them with textual descriptions to reverse engineer potential prompts from existing images. This helps AI artists understand how to structure prompts to achieve similar results or to analyze the components of images they find inspiring.
ComfyUI: A node-based interface for visually programming generative ai workflows. A core tool for advanced generated with open-weight models.
CUDA: NVIDIA's parallel computing platform that allows GPU's (Graphical Processing Units) to accelerate the processing of generative AI models.
Deforum: An open-source animation framework for AI image generation that enables creation of complex video sequences using diffusion models such as SDXL. Allows users to animate prompts, camera movements and other parameters over time to produce dynamic animated content. It's particularly known for its ability to create elaborate camera paths, zoom effects and imagery based on keyframes and prompt sequences.
GradIO: An open-source Python library that facilitates rapid development of customisable web-interfaces for machine learning models. It's designed to allow fast implementation and sharing of interactive demos of AI applications without requiring extensive web development abilities. Very commonly used in the AI art community to create user-friendly interfaces for diffusion models and other generative AI systems.
Hugging Face: An online hub for sharing machine learning models, datasets and tools. Considered by many the heart of the open-weight community, acting as the main distribution host for many of the biggest open-weight models such as the LLama, Gemma and Stable Diffusion families of models.
TensorRT: An NVIDIA platform for high-performance deep learning inference, optimising models for faster and more efficient execution.
About Us
The High-Tech Creative
Your guide to AI's creative revolution and enduring artistic traditions
Publisher & Editor-in-chief: Nick Bronson
Fashion Correspondent: Trixie Bronson
AI Contributing Editor and Poetess-in-residence: Amy
If you have enjoyed our work here at The High-Tech Creative and have found it useful, please consider supporting us by sharing our publication with your friends, or click below to donate and become one of the patrons keeping us going.