You’re using Stable Diffusion wrong. Most people are.
I spent two weeks generating thousands of images – adjusting prompts, tweaking CFG scales, rolling seeds like a casino addict – trying to get one portrait where the hands didn’t look like alien appendages. When I finally got a face I liked, the lighting was off. When the lighting worked, there was a phantom limb growing out of the shoulder.
Then someone told me to stop.
The problem wasn’t my prompts. It was that I was regenerating entire 512×512 canvases when I only needed to fix 64×64 pixels. Inpainting lets you edit specific regions of an image without torching the parts that already work. It’s the difference between repainting your entire house because one wall has a scuff versus just touching up the scuff.
Why “Just Generate More Images” Doesn’t Scale
Here’s what the typical workflow looks like: you generate 50 images in txt2img. You find one that’s 80% there – good composition, decent lighting, but the subject’s hand has six fingers and the background has a floating eyeball.
Most tutorials tell you to add “extra fingers, deformed hands” to your negative prompt and generate another batch. Which works maybe 30% of the time. The other 70%? You’ve now got a different face, different pose, different everything – because you reset the entire image generation process.
Inpainting solves this by letting you mask the broken hand, tell the model “regenerate just this area,” and keep everything else untouched. According to Hugging Face’s official model card, the inpainting model was specifically trained with 440k additional steps on masked regions to handle exactly this scenario.
How Inpainting Actually Works (Without the Math)
Standard Stable Diffusion generates images by starting with random noise and gradually denoising it based on your prompt. Inpainting does the same thing – but only inside the area you mark with a mask.
The technical detail that matters: the inpainting model’s UNet has 5 extra input channels (4 for the encoded masked image, 1 for the mask itself). This means it can see what was originally there and blend the new generation seamlessly.
Here’s where it gets interesting. If you use a regular checkpoint (not an inpainting-specific one), it can still work – the model just substitutes unmasked areas at each diffusion step with the original latents plus noise. But per this technical breakdown, regular models fail catastrophically at one task: removing objects with an empty prompt. They don’t know what “nothing” looks like, so you get smears and artifacts instead of clean background.
Dedicated inpainting models handle this because they were trained on synthetic masks with 25% of training steps using fully masked images.
The Workflow That Actually Works
I’m assuming you’re using AUTOMATIC1111 WebUI – the most common interface as of early 2026. ComfyUI and Forge work similarly but with different layouts.
Step 1: Get an image worth fixing
Generate your base image in txt2img. Don’t obsess over perfection. If 60% of it works, that’s enough. Click the “Send to inpaint” button below the image (it’s a tiny icon that looks like a paintbrush).
Step 2: Mask the problem area
In the Inpaint tab under img2img, use the brush tool to paint over what you want to change. Paint generously – a bit of margin around the defect helps the model blend better. The mask will show as a colored overlay.
Pro tip from experience: if you’re fixing a hand, mask the entire hand plus part of the wrist. Masking just the fingers creates weird discontinuities.
Step 3: Adjust your prompt
You can reuse the original prompt, or modify it to focus on the masked area. If you’re fixing a hand, adding “elegant hand, five fingers, natural pose” at the start of your prompt helps guide the model.
Want to replace something? Change the prompt. Masked a coffee cup? Write “glass of wine” and the model will swap it.
Step 4: Configure the three settings that matter
Most inpainting settings are noise. These three are not:
- Denoising strength: Start at 0.75. Lower = stays closer to original. Higher = more creative liberty. According to community testing, 0 changes nothing, 1 gives you something completely unrelated.
- Inpaint area: “Only masked” crops the masked region and uses full resolution on it – critical for tiny faces. “Whole picture” uses the full image context – better for backgrounds.
- Masked content: “Original” initializes with what was there (best for tweaks). “Latent noise” starts from scratch (best for removing things).
Generate 4-8 variations (set batch size to 4). Pick the best. If nothing works, adjust denoising ±0.1 and try again.
Step 5: Iterate if needed
Inpainting is iterative. You might fix the hand, then notice the wrist looks off, then fix the wrist. That’s normal. Just send the output back to inpaint and mask the new problem area.
But there’s a trap here – more on that in a second.
The Settings They Don’t Explain (And When They Break)
Let’s talk about “Only masked” versus “Whole picture.”
The Stable Diffusion v1.5 base model was trained at 512×512. If you generate a full-body portrait at that resolution, the face might only occupy 80×80 pixels. The model can’t generate facial features with that few pixels – it’s why faces far from the camera look like melted wax.
“Only masked” fixes this by cropping out just the masked area, scaling it up to 512×512 for generation, then scaling it back down and compositing it into the original. Suddenly the model has enough resolution to render a proper face.
The downside? You lose global context. If you mask half the background with “only masked,” the model doesn’t see the rest of the image, so colors and lighting might not match. That’s when you switch to “whole picture” – the model sees everything, keeps it consistent, but small details suffer.
Here’s something nobody mentions: your image dimensions must be divisible by 8 or you’ll hit this error:
RuntimeError: Sizes of tensors must match except in dimension 1.
Expected size 1 but got size 2 for tensor number 1 in the list.
This isn’t a bug. It’s because the VAE downsamples by a factor of 8. A 513×513 image breaks the math. Crop it to 512×512 or 520×520 and it works.
Another one: if you’re using the dedicated inpainting checkpoint (sd-v1-5-inpainting.ckpt), you might hit a “Negative Guidance minimum sigma” error. The fix: go to Settings > Optimizations, set that value to 0. This is documented in community guides but never in official docs.
What Happens When You Inpaint the Same Image 10 Times
I tried an experiment: I inpainted the Mona Lisa, then inpainted the result, then inpainted that result, 15 times in a row. By iteration 8, her face was unrecognizable. By iteration 12, the image was abstract noise.
Turns out this is a known phenomenon. A 2024 research paper calls it recursive inpainting collapse – similar to model collapse when you train AI on AI-generated data. Each pass subtly shifts the distribution. Stack enough shifts and you get garbage.
The practical takeaway: if you’re iterating on the same image, export intermediate versions. Don’t chain 10 inpainting passes without saving checkpoints, or you might end up with something you can’t recover.
Real Example: Fixing a Portrait in 3 Passes
I generated a portrait with this prompt:
“Elegant woman in a crimson dress, standing in a grand library, warm lighting, oil painting style, detailed, 4k”
First result: gorgeous lighting, great composition, but her left hand had seven fingers fused together like a claw.
Pass 1 (Fix the hand): Masked the hand, denoising 0.75, “only masked,” masked content set to “original.” Prompt: same as original plus “elegant hand, five fingers, natural pose” at the start. Result: clean hand, but now there was a weird shadow artifact on her dress sleeve.
Pass 2 (Fix the shadow): Masked the sleeve shadow, denoising 0.6 (lower because I just wanted a tweak), “whole picture,” masked content “original.” Prompt: unchanged. Result: shadow gone, but I noticed the background bookshelf looked blurry.
Pass 3 (Sharpen background): Masked the bookshelf, denoising 0.5, “whole picture,” masked content “original.” Prompt: added “sharp focus, detailed books” to the original. Result: crisp background, portrait done.
Total time: maybe 8 minutes. If I’d tried to get this right in txt2img, I’d still be on seed 4000.
When Inpainting Fails (And What to Do Instead)
Inpainting isn’t magic. Sometimes it just won’t cooperate.
Problem: The inpainted area looks pasted on, wrong lighting/color
Fix: Increase denoising slightly so the model takes more liberty. Or switch from “only masked” to “whole picture” so it sees the surrounding context.
Problem: Denoising at 1.0 gives me random garbage
Fix: You’re asking the model to ignore the original entirely. Try 0.85-0.95 instead. If you truly want something completely different, use “latent noise” or “latent nothing” for masked content – that initializes the area with noise instead of the original image.
Problem: SDXL inpainting at strength=1.0 looks noisy and degraded
Fix: This is a known limitation. According to the official SDXL inpainting model card, the autoencoding is lossy at strength=1.0. Use 0.99 instead. But beware: at 0.99, strong original colors dominate – one user found that a blue car stayed blue even when prompted for pink. The workaround is external editing (Photoshop/GIMP) to roughly paint the new color, then inpaint at lower strength to refine.
Problem: Inpainting returns the original image unchanged
Check your denoising strength (must be > 0). Check that your mask is actually visible. Check your image dimensions – if they’re not divisible by 8, resize.
Should You Use a Dedicated Inpainting Model?
Short answer: for object removal, yes. For tweaks, maybe not.
Regular checkpoints can inpaint just fine if you’re fixing details. But if you want to cleanly erase something – like removing a person from a photo – the dedicated model performs way better. It was trained to understand “empty prompt = fill with contextual background.” Regular models don’t get that and smear colors randomly.
For SDXL, as of early 2026, community members are merging base models with the official SDXL inpainting checkpoint to create hybrid versions like Juggernaut XL Inpaint. These reduce the visible seam you get when outpainting (extending an image beyond its borders). If you do outpainting, grab one of those.
One Setting Almost Everyone Gets Wrong
Here’s a small thing that matters more than it should: mask blur.
Most people leave it at the default (4). But if your inpainted area has hard edges that don’t blend, try setting mask blur to 8-12. It feathers the mask boundary so the transition is gradual. Conversely, if your inpaint is bleeding into areas it shouldn’t, lower mask blur to 0-2 for a sharper boundary.
Nobody explains this because it’s a subtle slider. But it’s the difference between “looks Photoshopped” and “looks smooth.”
What’s Next? Try This Tonight
Here’s your homework: don’t generate a perfect image from scratch. Generate something 70% right, then fix it with inpainting.
Pick one image you already made that has a flaw – a bad hand, a weird background object, whatever. Send it to inpaint. Mask the flaw. Set denoising to 0.75, inpaint area to “only masked” if it’s small or “whole picture” if it’s large, and masked content to “original.” Generate 4 variations. Pick one.
That’s it. Once you feel the difference between “regenerate 100 images hoping for luck” and “fix the one thing that’s broken,” you won’t go back.
And if you want to go deeper – combine inpainting with ControlNet. ControlNet lets you preserve pose, composition, or style while inpainting. But that’s a whole other article.
Does inpainting work with all Stable Diffusion models?
Yes, any SD checkpoint can inpaint – it’s a feature of the workflow, not the model. However, dedicated inpainting models (like sd-v1-5-inpainting.ckpt or SDXL inpainting checkpoints) perform better, especially for object removal or complex edits. Regular models work fine for small fixes like adjusting a face or hand. If you’re using a custom checkpoint (like a LoRA or fine-tune), inpainting still works – you’re just not getting the extra 440k inpainting-specific training steps.
Why does my inpainted area look blurry or low-quality?
Two common causes. First: denoising strength is too low (under 0.4), so the model barely changes anything and you get a smeared blend of old and new. Bump it to 0.6-0.8. Second: you’re using “whole picture” mode on a very small masked area, so the model is generating at low effective resolution. Switch to “only masked” and set padding to 32-64 pixels – it’ll upscale just that region, generate at full resolution, then composite it back. Also: if you’re using SDXL inpainting at strength=1.0, the autoencoder degrades quality. Drop to 0.99 or lower.
Can I use inpainting to change the style of part of an image?
Absolutely. Mask the area, then change your prompt to describe the new style – “watercolor painting” instead of “oil painting,” for example. Keep denoising around 0.7-0.9 so the model has enough freedom to reinterpret the region. You can even mix styles: photorealistic subject on a painted background by inpainting the background with a style-specific prompt. The trick is using “whole picture” mode so lighting and composition stay consistent, and setting masked content to “original” so the model has context for what it’s transforming. If the style shift is too abrupt, try a second pass at lower denoising (0.5-0.6) to blend the boundary.