Shorts: Concepts

In this section, we explain some basic concepts about Generative AI processes. If you have questions about a specific topic, you can get in touch with us and we will try to help!

Core AI Concepts

Prediction

Every time you run a model on a powered with AI app, you’re creating a prediction. A prediction is an object that represents a single result from running the model, from an imput to an ouput. As an input it could include the prompt you provided, an image, or the settings. The output is the text, image, sound, or other kind of result but also other metadata like the model version, the user who created it, and timestamps.

Whenever you run a model, you’re creating a prediction. AI predictions could take a few milliseconds or minutes, even hours, depending on the model and server.

Temperature or Guidance Scale

The temperature, or guidance scale (sometimes referred as cfg - classifier free guidance) is a parameter that controls how much the image generation process follows the text prompt. The higher the value, the more image sticks to a given text input but also the less diversity and quality you will get.

Dataset

A machine learning dataset is a collection of data that is used to train the model. A dataset acts as an example to teach the machine learning algorithm how to make predictions. The common types of data includes text data or image data.

AI Biases

Machine learning bias, also known as algorithm bias or AI bias, is a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning (ML) process. Deep learning modules tend to reproduce or exacerbate biases that are already present in the training dataset.

Model Architecture & Training

Diffusion Models

Diffusion models are a class of generative models that learn to create images by gradually removing noise from random data. They work by adding noise to training images and then learning to reverse this process. Popular examples include Midjourney, Stable Diffusion, DALL-E, Krea, which have revolutionized AI image generation.

Latent Space

Latent space is a compressed representation of data where similar concepts are positioned close together. In image generation, models work in this compressed space rather than directly with pixels, making the process more efficient and allowing for smoother interpolations between different concepts.

VAE (Variational Autoencoder)

A VAE is a component that compresses images into latent representations and reconstructs them back to pixel space. In Stable Diffusion, the VAE handles the conversion between the latent space where generation happens and the final image output.

U-Net

U-Net is the core neural network architecture used in diffusion models for image generation. It predicts the noise to be removed at each step of the denoising process, gradually transforming random noise into coherent images based on text prompts.

Fine-tuning

Fine-tuning involves training a pre-trained model on specific data to adapt it for particular styles, subjects, or use cases. This process allows customization of AI models for specialized applications without training from scratch.

LoRA (Low-Rank Adaptation)

LoRA is an efficient fine-tuning technique that modifies only a small subset of model parameters, allowing for quick adaptation to new styles or concepts while maintaining the original model’s capabilities. LoRA models are lightweight and can be easily shared and combined.

Control & Guidance Systems

Adapters

Adapters are added to AI-pre-trained text-to-image models. They let us input visual information to guide large text-to-image models during the generation process. Some examples are ControlNet or T2I-Adapter.

ControlNet

ControlNet is a powerful adapter that enables precise control over image generation using various input conditions like edge maps, depth maps, pose estimation, or sketches. It allows users to maintain structural consistency while generating images, making it invaluable for architectural visualization and design work.

T2I-Adapter

T2I-Adapter provides an alternative to ControlNet for guiding image generation with visual inputs. It offers similar functionality but with different implementation approaches, often requiring less computational resources while maintaining good control over the generation process.

CFG Scale

Classifier Free Guidance (CFG) Scale determines how closely the AI follows your text prompt. Lower values (1-7) create more creative, diverse images but may stray from the prompt. Higher values (8-20) stick closely to the prompt but may reduce image quality and creativity.

Generation Parameters

Seeds

A seed is a number that determines the initial random state for image generation. Using the same seed with identical settings will produce the same image, allowing for reproducible results and systematic experimentation with different parameters.

Sampling Methods

Sampling methods (like DPM++, Euler, DDIM) determine how the AI removes noise during image generation. Different samplers can affect image quality, generation speed, and the final aesthetic, with some being better suited for specific types of content.

Noise Scheduling

Noise scheduling controls how noise is added and removed during the diffusion process. Different schedules can affect the quality and characteristics of generated images, with some schedules being optimized for faster generation or better quality.

Prompt Engineering

Prompt engineering is the art and science of crafting effective text prompts to guide AI image generation. It involves understanding how different words, phrases, and structures influence the output, including the use of negative prompts to avoid unwanted elements.

Advanced Techniques

Inpainting

Inpainting allows you to modify specific parts of an existing image by masking areas and generating new content to fill them. This technique is useful for editing, removing objects, or adding elements to existing images while maintaining visual coherence.

Outpainting

Outpainting extends images beyond their original boundaries, generating new content that seamlessly continues the existing image. This technique is valuable for expanding compositions or creating panoramic views from smaller images.

Upscaling

AI upscaling enhances image resolution and quality using machine learning algorithms. Modern upscaling techniques can increase image size while adding realistic details, making them essential for preparing AI-generated images for high-resolution applications.

Multimodal AI

Multimodal AI combines multiple input types (text, images, sketches, audio) to create more sophisticated and controllable generation processes. This approach enables more intuitive interaction with AI systems and better creative control.

Real-time Generation

Real-time generation refers to AI systems that can produce images almost instantaneously as you type or draw. This capability enables interactive creative workflows and immediate feedback, revolutionizing how designers and artists work with AI tools.


External references:

AI Image Generation Platforms:

Technical Documentation & Tools:

Research & Learning:


Related Posts

ControlNet SoftEdge vs. Canny: Which Tool Should I Use?

ControlNet SoftEdge vs. Canny: Which Tool Should I Use?

Explore how ControlNet's SoftEdge and Canny options can refine image generation, highlighting when each tool shines.

Firefly Adobe AI new features

Firefly Adobe AI new features

We'll explore the latest features that are revolutionizing the way we use Adobe's creative tools, including Photoshop, Illustrator, Premiere, and more.

AI Rendering for Architects and Designers: Tools and Trends in 2025

AI Rendering for Architects and Designers: Tools and Trends in 2025

Discover how AI image generation tools like ChatGPT Image, Stable Diffusion 3, Flux, and MidJourney are transforming visual workflows for architects, interior designers, and creative teams.