Back to Tutorials
tutorialstutorialaivision

🖌️ Generate Images with Stable Diffusion XL on Mac M1/M2 (January 2026)

🖌️ Generate Images with Stable Diffusion XL on Mac M1/M2 January 2026 Introduction In this comprehensive tutorial, we'll guide you through setting up and using Stable Diffusion XL, a powerful text-to-image model from CompVis, on your Mac M1 or M2 device.

Daily Neural Digest AcademyJanuary 5, 20269 min read1 696 words

🖌️ When Silicon Dreams: Running Stable Diffusion XL on Apple Silicon in 2026

The dream of local AI image generation has always felt tantalizingly close—yet perpetually out of reach for anyone without a rack of NVIDIA GPUs cooling in a basement server room. For years, Mac users watched from the sidelines as their Windows and Linux counterparts generated breathtaking visuals with tools like Stable Diffusion, while Apple Silicon owners were left with cloud APIs or clunky CPU-bound workarounds that took hours to produce a single blurry image.

That era is officially over. As of January 2026, the landscape has shifted dramatically. With Apple's Metal Performance Shaders (MPS) backend maturing into a first-class citizen for machine learning workloads, and the open-source ecosystem rallying behind ARM-native optimizations, running a state-of-the-art text-to-image model like Stable Diffusion XL locally on a Mac M1 or M2 is not just possible—it's practical, performant, and surprisingly elegant.

This isn't a hack. It's a legitimate workflow for creative professionals, researchers, and anyone who values privacy, latency, and the satisfaction of watching their own silicon dream up pixel-perfect worlds. Let's dive into how this works, why it matters, and exactly how you can set it up today.

The Architecture of Imagination: Understanding Stable Diffusion XL on MPS

Before we get our hands dirty with code, it's worth understanding what makes Stable Diffusion XL (SDXL) such a fascinating beast—and why running it on Apple Silicon is a genuinely impressive engineering achievement.

Stable Diffusion XL, developed by CompVis, represents a significant leap over its predecessor. Where the original Stable Diffusion operated in a latent space of 512x512 pixels, SDXL ups the ante with a base resolution of 1024x1024, a larger UNet backbone, and a dual-stage refinement process that produces images with dramatically better composition, detail, and prompt adherence. This isn't just a bigger model; it's a fundamentally more capable one.

The challenge, of course, is that bigger models require more compute. On a typical desktop GPU, SDXL can generate a single image in 10-15 seconds. On a Mac M1 or M2, we're leveraging the unified memory architecture and the MPS backend—Apple's implementation of PyTorch's mps device—to offload tensor operations to the GPU. The key insight here is that Apple Silicon's unified memory allows the model to access the full 16GB or 32GB of system RAM as GPU memory, which is a massive advantage over discrete GPUs with fixed VRAM limits.

However, there's a catch: SDXL was originally designed for CUDA-optimized hardware, and the MPS backend is still catching up in terms of operator coverage and performance. This means we need to be strategic about our approach, using half-precision (float16) tensors and carefully managing memory to avoid out-of-memory errors. The good news? With the right setup, the results are genuinely impressive.

Forging the Toolkit: Prerequisites and Project Scaffolding

Every great creative endeavor begins with the right tools. For our journey into local AI image generation, we need a carefully curated stack of software that plays nicely with Apple Silicon's unique architecture.

First and foremost, you'll need Python 3.10 or later. While newer versions of Python exist, 3.10 offers the best compatibility with the current generation of machine learning libraries. The recommended installation method is via Homebrew, which ensures you get an ARM-native build:

brew install python@3.10

Next comes the critical component: PyTorch with MPS backend support. As of early 2026, the stable path is PyTorch 1.12, which includes robust MPS support for most common operations. The installation command is deceptively simple:

pip install torch==1.12 --extra-index-url https://download.pytorch.org/whl/cu102

Note that the cu102 extra index URL is a historical artifact—PyTorch's build system still references CUDA even for MPS builds, but the resulting package includes the mps device. This is one of those quirks you learn to live with in the AI ecosystem.

We'll also need the Transformers library (version 4.18) to load and interact with the SDXL model, and Gradio (version 3.7) to build a simple web interface for our image generator. These are installed via pip:

pip install transformers==4.18
pip install gradio==3.7

With our dependencies in place, we can scaffold the project. Create a new directory and a requirements.txt file to ensure reproducibility:

mkdir stable_diffusion_xl
cd stable_diffusion_xl

The requirements.txt should contain:

python=3.10
torch==1.12 --extra-index-url https://download.pytorch.org/whl/cu102
transformers==4.18
gradio==3.7

This might seem like overkill for a simple project, but in the world of AI, dependency management is the difference between a smooth workflow and a weekend of debugging. Trust me on this.

Breathing Life into the Model: Core Implementation and the MPS Dance

Now we arrive at the heart of the operation: the code that bridges the gap between text and image, between human imagination and silicon computation. Create a file named diffusion.py and prepare to witness something remarkable.

The core of our implementation is an ImageGenerator class that encapsulates the model loading, prompt processing, and image generation pipeline. Here's the complete implementation:

import torch
from transformers import StableDiffusionXLProcessor, StableDiffusionXLPipeline

class ImageGenerator:
    def __init__(self):
        self.model_id = "CompVis/stable-diffusion-xl-base-1.0"
        self.processor = StableDiffusionXLProcessor.from_pretrained(self.model_id)
        self.pipe = StableDiffusionXLPipeline.from_pretrained(
            self.model_id, 
            torch_dtype=torch.float16
        ).to("mps")

    def generate_image(self, prompt):
        inputs = self.processor(
            text=prompt, 
            num_inference_steps=50, 
            guidance_scale=7.5, 
            width=640, 
            height=640
        )
        image = self.pipe(**inputs).images
        return image

def main():
    generator = ImageGenerator()
    prompt = "Astronaut riding a horse on Mars"
    generated_image = generator.generate_image(prompt)
    generated_image.save("generated_image.png")

if __name__ == "__main__":
    main()

Let's unpack what's happening here. The StableDiffusionXLProcessor handles the tokenization and encoding of our text prompt, converting it into the numerical representations that the model understands. The StableDiffusionXLPipeline is the heavy lifter—it contains the UNet, VAE, and text encoder that collectively perform the denoising process.

The critical line is .to("mps"), which moves the entire pipeline to Apple's Metal Performance Shaders backend. This is where the magic happens: every tensor operation, every matrix multiplication, every attention computation is executed on the M1 or M2 GPU, leveraging the unified memory architecture for blazing-fast inference.

The generate_image method accepts a prompt and a set of parameters that control the generation process. The num_inference_steps parameter (50 in this case) determines how many denoising steps the model performs—more steps generally produce higher quality images at the cost of longer generation times. The guidance_scale (7.5) controls how closely the model adheres to your prompt; higher values produce more literal interpretations, while lower values allow for more creative divergence.

One important note: I've set the image dimensions to 640x640 rather than the full 1024x1024 that SDXL supports natively. This is a deliberate trade-off to ensure compatibility with the MPS backend and to avoid out-of-memory errors on systems with 16GB of unified memory. If you have a 32GB or 64GB Mac, feel free to increase these values.

From Prompt to Pixel: Running the Generator and Troubleshooting

With our implementation complete, it's time to witness the fruits of our labor. Run the script with a simple command:

python diffusion.py

The first run will trigger a model download—SDXL is approximately 7GB, so ensure you have a stable internet connection and sufficient disk space. The download happens once; subsequent runs will load the model from cache.

During generation, you'll see progress bars for each inference step. On an M1 Max with 32GB of unified memory, a 50-step generation at 640x640 takes approximately 45-60 seconds. On an M2 Ultra, you can expect times closer to 25-35 seconds. These numbers are competitive with mid-range desktop GPUs, which is remarkable for a laptop chip.

If you encounter errors related to GPU memory, the most effective solution is to reduce the image dimensions. Modify the generate_image method to use 320x320:

inputs = self.processor(
    text=prompt, 
    num_inference_steps=50, 
    guidance_scale=7.5, 
    width=320, 
    height=320
)

This halves the memory footprint while still producing usable images. You can also experiment with reducing num_inference_steps to 30 or 20, though this will impact image quality.

Another common issue is the dreaded "MPS backend not available" error. This typically occurs when PyTorch was installed without MPS support. Verify your installation with:

import torch
print(torch.backends.mps.is_available())

If this returns False, you may need to reinstall PyTorch or check that your macOS version is 12.3 or later (the minimum for MPS support).

Beyond the Basics: Advanced Techniques and the Road Ahead

Congratulations—you've successfully generated an image using Stable Diffusion XL on your Mac. But this is just the beginning. The true power of local AI image generation lies in customization and experimentation.

One of the most exciting directions is fine-tuning the model on custom datasets. Using the transformers.Trainer API, you can adapt SDXL to specific artistic styles, domains, or even your own artwork. This process, known as "dream booth" or "LoRA" training, allows you to teach the model new concepts with as few as 10-20 images. Imagine generating images in your own illustration style, or creating a model that understands your company's brand guidelines.

Another avenue is exploring alternative models. While SDXL is our focus here, the same infrastructure can support other text-to-image models like DALL-E 3, Imagen, or Midjourney—though each requires its own implementation and may have different hardware requirements. The open-source LLMs ecosystem is evolving rapidly, and staying current with new releases can dramatically expand your creative capabilities.

For those interested in building production-ready applications, integrating Gradio to create a web interface is a natural next step. A simple UI with a text input, parameter sliders, and an image output can transform this script into a tool that designers, marketers, and content creators can use without touching a terminal. This is particularly valuable for teams exploring AI tutorials and rapid prototyping workflows.

The broader implications are profound. As Apple Silicon continues to mature and the MPS backend gains feature parity with CUDA, we're witnessing the democratization of AI compute. Creatives no longer need to rely on cloud services that charge per image or compromise on privacy. Researchers can iterate rapidly without waiting for cluster time. And hobbyists can explore the frontiers of generative AI from the comfort of their laptops.

The image you just generated—an astronaut riding a horse on Mars—is more than a novelty. It's a proof point that the future of creative computing is local, private, and accessible. The silicon in your Mac is dreaming, and it's dreaming in your language.

Welcome to the new frontier.


tutorialaivision
Share this article:

Was this article helpful?

Let us know to improve our AI generation.

Related Articles