Skip to content

SDXL vs SD 1.5: Which Stable Diffusion Model to Use

Most tutorials say SDXL is always better. That's wrong. The choice depends on your GPU, workflow speed, and whether the massive ecosystem of SD 1.5 custom models matters more than raw quality.

8 min readIntermediate

Here’s something the SDXL hype articles won’t tell you: if you spent months training a custom LoRA on SD 1.5 for your brand’s product photos, it’s worthless on SDXL. Load it anyway, and you won’t get an error – just broken output. The architectures are incompatible. Your training time? Gone.

That’s the choice nobody frames honestly.

The Problem: Speed vs Ecosystem vs Quality

You’re generating images locally. SDXL produces objectively better detail – the research paper shows users overwhelmingly prefer it to SD 1.5 in blind tests (as of July 2023). SDXL takes 2-3x longer per image on the same hardware. It breaks compatibility with thousands of custom models. It chokes on GPUs under 12GB.

SD 1.5 is older, lower resolution, worse at prompt adherence. Also faster. Has a library of thousands of fine-tuned checkpoints that took years to build. Actually runs on 8GB cards without swapping to system RAM.

Standard advice: “SDXL if you want quality, SD 1.5 if you need speed.” Incomplete.

What Actually Changed Under the Hood

SDXL isn’t just “SD 1.5 but bigger.” The technical paper (July 2023): fundamentally different architecture. UNet is 3x larger – 2.6 billion parameters vs 860 million. Two text encoders instead of one. OpenCLIP ViT-G and the original CLIP ViT-L working in parallel.

Why SDXL handles complex multi-subject prompts better. “A red cube on top of a blue sphere” actually works most of the time. SD 1.5 merges those concepts into abstract nonsense.

Native resolution: 512×512 → 1024×1024. That’s 4x more pixels per generation. SDXL was trained with size and crop conditioning – it learned that images aren’t always centered. SD 1.5 randomly decapitates subjects.

Those extra parameters need somewhere to live. SDXL officially requires 8GB VRAM minimum (as of 2023 release). Community testing? 10GB is the real floor for reasonable speed. 16GB is where it stops fighting you.

The 8GB VRAM Trap Nobody Mentions

Stability AI says SDXL works on 8GB cards. Technically true. Actually? You’ll hate it.

RTX 3070 (8GB): SDXL requires the --medvram flag in Automatic1111. Offloads parts of the model to system RAM during generation. One 1024×1024 image: 10-15 minutes. SD 1.5 on the same card? 30 seconds.

It doesn’t crash – just becomes unusable for iteration. You can’t experiment when each attempt costs 10 minutes. GitHub community reports show RTX 3060, 3070, even some 3080 10GB users switching back to SD 1.5 after local testing.

12GB (RTX 3060 12GB, RTX 4070)? SDXL becomes viable but still slower. 24GB (RTX 3090, 4090): where SDXL actually feels responsive.

SD 1.5: runs comfortably on 6GB. Fast on 8GB. Not about old hardware – about whether your workflow needs 20 iterations to nail a prompt or just 3.

LoRA Compatibility: The Invisible Wall

SD 1.5 has thousands of LoRAs on Civitai – character faces, art styles, product aesthetics, pose libraries. Community built them over years.

None work on SDXL.

Not a conversion problem. Architectural incompatibility. SDXL’s dual text encoders plus different UNet structure mean an SD 1.5 LoRA loads without error – output is unpredictable. Sometimes subtly wrong. Sometimes completely broken. Community docs on Civitai: even attempting to convert embeddings between versions produces unreliable results.

Your workflow depends on specific LoRAs? You trained one on your company’s product catalog? Using a battle-tested character LoRA for client work? Starting over with SDXL. Months of training time, not a simple upgrade path.

SDXL LoRA library is growing. Still 2+ years behind. Many niche styles and subjects that exist for SD 1.5 don’t have SDXL equivalents yet (as of early 2025).

When SD 1.5 Custom Checkpoints Win on Quality

Base SDXL beats base SD 1.5. Not the real comparison.

Real comparison: SDXL base vs SD 1.5 fine-tuned checkpoints. Dreamshaper. Epic Realism. Realistic Vision. These models were trained on curated datasets for months – manual quality filtering, specific aesthetic goals.

Community testing: custom SD 1.5 checkpoints outperform base SDXL on hand anatomy. SDXL improved hands over base SD 1.5, but fine-tuned SD 1.5 models spent years solving that exact problem with better training data.

For photorealism? High-end SD 1.5 checkpoints compete with SDXL. Trained on higher quality photo datasets than SDXL’s base training. You lose resolution (512px native vs 1024px), but you can upscale. You don’t lose the photographic coherence those checkpoints deliver out of the box.

Comparing SDXL to base SD 1.5? SDXL wins. Comparing SDXL to the best SD 1.5 checkpoint for your specific use case (portrait realism, anime, architectural visualization)? Less obvious.

The Refiner Problem

SDXL has an optional refiner model – second pass that adds fine detail to base output. The refiner is a 6.6B parameter ensemble (as of July 2023 release). Sounds great. Situational.

Refiner was trained on high-resolution generic data. Stock photorealism? Helps. Subject-specific work – fine-tuned SDXL on a character or product? Refiner often makes things worse. Doesn’t know your custom subject. GitHub issue #620 on kohya-ss/sd-scripts: the refiner “doesn’t understand the subject, which often makes using the refiner worse with subject generation.”

Refiner adds processing time. Lower-end hardware? Can double generation. Requires keeping both models in VRAM simultaneously – pushes 12GB cards to their limit.

Many workflows skip it entirely. Not a free quality boost – tradeoff that works in specific scenarios.

Speed at Different Resolutions

SDXL is slower than SD 1.5 at low resolutions. Not just in absolute terms – relative to quality.

Generating 512×768 images (common for portrait crops)? You’re fighting SDXL’s native 1024×1024 training. Community docs: SDXL is 35% slower at low resolutions compared to SD 1.5. Worse quality too – SDXL at 512px produces artifacts because the model expects 1024px inputs.

SD 1.5 at 512×512: in its sweet spot. Fast, clean, no artifacts. SDXL at 512px: slower plus composition problems (multiple subjects, weird crops) because you’re outside its training distribution.

At 1024px and above? SDXL pulls ahead. Benchmark comparisons show 30%+ faster at 1500px+. Your workflow lives in the 512-768px range? SD 1.5 wins on both speed and output quality.

Think of it this way: SDXL is a 1024px specialist. Force it to work at 512px and you’re paying the parameter overhead for worse results than a model designed for that resolution.

Pro tip: Stuck on an 8GB card but want SDXL quality? Generate the composition with SD 1.5 at 512px. Then use SDXL in img2img mode at 1024px for the final render. Avoids SDXL’s slow text-to-image pass while leveraging its detail refinement. Requires both models loaded, but the speed gain is real.

Decision Framework

Choose SDXL if:

  • 16GB+ VRAM (24GB ideal)
  • Generating at 1024px or higher
  • Prompt adherence matters more than ecosystem access – complex multi-subject scenes, specific compositions, text rendering
  • Starting fresh without legacy LoRAs or custom checkpoints
  • Can afford 2-3x longer generation times for better base quality

Choose SD 1.5 if:

  • 8-12GB VRAM and need fast iteration
  • Generating below 1024px resolution
  • Depend on specific custom checkpoints or trained LoRAs that don’t exist for SDXL yet (as of early 2025)
  • Workflow requires testing 50+ prompts per session – speed matters more than per-image quality
  • Subject-specific work (character consistency, product shots) where fine-tuned SD 1.5 checkpoints outperform base SDXL

Hybrid: SD 1.5 for experimentation and layout, SDXL for final renders. Many production workflows do this – prototype fast on SD 1.5, upscale or refine with SDXL once the composition is locked.

What the FID Scores Don’t Tell You

SDXL’s research paper mentions something weird: despite drastically better human preference scores, SDXL has worse FID (Fréchet Inception Distance) scores than SD 1.5.

FID measures how “realistic” images are by comparing them to a reference dataset. Lower is better. SDXL’s FID is higher (worse) than SD 1.5 and SD 2.1.

Why? FID doesn’t correlate with visual aesthetics. The paper cites research showing “COCO zero-shot FID is negatively correlated with visual aesthetics.” The metric academic papers use to compare models is inversely related to what humans actually prefer.

You’ll see benchmark tables claiming SD 1.5 is “better” based on FID. Ignore them. Human preference testing is the real measure – SDXL wins that clearly (as documented in the July 2023 paper). Raw numbers don’t always map to workflow value.

What to Actually Do

Test both on your specific use case. Download SD 1.5 and SDXL base. Same prompt, same seed, same settings. 10 images with each. Compare them.

Then check your GPU usage. SDXL maxing out VRAM and swapping to RAM? Generation time killing your iteration speed? Can you find a custom SD 1.5 checkpoint that gets closer to SDXL quality at your target resolution?

Answer isn’t universal. Hardware-dependent, workflow-dependent, goal-dependent. SDXL is objectively more capable as a base model. SD 1.5’s library and speed still make it the better choice for specific scenarios in 2025.

Frequently Asked Questions

Can I use SD 1.5 LoRAs with SDXL models?

No. Architecturally incompatible. You must train separate LoRAs for each.

Is 8GB VRAM enough for SDXL?

Technically yes with --medvram. Actually? 10-15 minute generation times as the model swaps to system RAM. SD 1.5 generates the same resolution image (upscaled from 512px) in under a minute on the same hardware. One debugging session burns through your patience fast. For usable iteration speed, 12GB is the real minimum.

Does SDXL always produce better image quality than SD 1.5?

Base SDXL beats base SD 1.5. Fine-tuned SD 1.5 models like Dreamshaper or Realistic Vision – trained on curated datasets for specific aesthetics – often outperform base SDXL on anatomy, photorealism, and style consistency (as of early 2025). SDXL wins on prompt adherence and native resolution. SD 1.5’s collection includes specialist models that excel in narrow domains. Quality depends on your specific use case and whether an equivalent fine-tune exists for SDXL. Remember that refiner problem from earlier? It often makes custom fine-tunes worse, not better.