Every AI art tutorial hands you the same formula: Subject + Style + Lighting + Mood + Composition. Type it in, get a sort-of-okay image, walk away confused why your output looks nothing like the demo. The problem isn’t the formula – it’s that nobody explains how the model actually reads it.
Not all slots carry equal weight. Some words actively sabotage your output. And on at least one major platform, your prompt is silently rewritten before it ever reaches the image model. Those are the parts worth understanding.
The 6-slot formula – and why slot order is the actual trick
Six slots. The sequence is the trick, not the slots themselves:
- Subject – who or what (one elderly fisherman, a single tulip)
- Medium – the format (oil painting, 35mm photo, vector illustration)
- Style anchor – movement, era, or named aesthetic (Art Nouveau, 1970s editorial, risograph)
- Environment – where (kitchen counter, foggy harbor, blank studio)
- Lighting – direction and quality (rim-lit, overcast, hard side light)
- Composition – framing (close-up, wide shot, overhead flat lay)
This matches what Midjourney’s official Prompt Basics page recommends (as of mid-2025), with one addition: a Style anchor slot the official guide leaves out. As for order – words at the start of a Midjourney prompt carry more weight than words at the end. So if you want a watercolor-first look, medium and style go before subject. Photographic realism? Subject comes first.
That single insight is why the same formula adapts to any style. Move the slot you care about most to the front.
The same subject, three styles
Same subject. Slots reordered. Three different outputs:
# Photographic - subject leads
a photo of an elderly fisherman, weathered face, harbor at dawn, soft overcast light, medium close-up, 35mm
# Illustrated - medium and style lead
flat vector illustration, 1970s travel poster style, elderly fisherman by a harbor, two-tone palette, centered composition
# Painterly - medium and era lead
oil painting, Dutch Golden Age, elderly fisherman mending a net, candlelit interior, chiaroscuro lighting, three-quarter portrait
The subject (elderly fisherman) is identical across all three. Only the front of the prompt shifts.
One pattern that catches people off guard: per community testing (Kiki and Mozart, April 2025), including “photorealistic” in a Midjourney prompt may produce paintings instead of photos – photos are already realistic by default, so the word confuses rather than guides. Starting with “a photo of…” is the more reliable move.
Token weights – the hidden layer
Models don’t read words. They read tokens. And token weight – not word count – determines what actually shows up.
Turns out you can see exactly what Midjourney does with your prompt: the /shorten command (as of mid-2025) breaks your text into tokens and assigns each a weight between 0.00 and 1.00. Tokens at 0.00 are being ignored. That lovely descriptor you spent five minutes choosing? It might be carrying zero weight.
Quick audit: Run any prompt longer than ~20 words through
/shortenbefore generating. If a key word scores 0.00, either move it forward, cut surrounding words, or fuse it to a neighbor with an underscore. Joining words with underscores –desaturated_cold_tonesinstead ofdesaturated cold tones– fuses them into a single token with higher combined weight. Three separate tokens scoring 0.00, 0.04, 0.00 become one token that actually registers.
Think of it like a painting where the foreground is sharp and the background blurs out. Words at the front of your prompt are in focus. Everything crowded near the end competes for the model’s attention – and some of it loses entirely.
Four pitfalls that ruin good prompts
Contradicting yourself. Pairing “photorealistic” with “abstract,” or “minimalist” with “highly detailed,” forces the model to pick one – and you don’t get to choose which. Pick a lane.
Over-stuffing the negative side. In Midjourney, --no is equivalent to a -0.5 weight (per Midjourney’s official multi-prompt docs). Stacking five --no terms doesn’t make the exclusion stronger – it just burns prompt space.
Naming living artists. Ask for “the style of [living artist name]” and most AI tools will refuse or quietly modify your request. OpenAI’s image systems add explicit refusals for living artist style requests. The fix: describe the style by movement + medium + era instead. “In the style of a 2010s indie graphic novel” gets you most of what you wanted, with none of the friction – per the Neolemon AI art style guide (January 2026).
Prompt length creep. More on the sweet spot in the FAQ below.
What the formula actually gets you
70-80% of your mental image on the first try. That’s an author’s observation after iterating across styles – not a published stat. The remaining 20% comes from running variants, swapping one slot, regenerating. Anyone selling “perfect on the first try” is showing you the 50th attempt.
There’s also a diffusion-model quirk that trips up a lot of first outputs: according to Microsoft’s AI art prompting guide, diffusion models often smooth out fine textures – skin pores, fabric grain – because the denoising stage mistakes them for noise. Output looks too polished, too glossy. Adding “film grain,” “skin texture detail,” or “35mm grain” to your prompt pushes back against it.
Two cases where the formula gives you false confidence
1. You’re using DALL-E 3 inside ChatGPT. Your slot order barely matters. DALL-E 3 applies a prompt transformation step – a large language model rewrites your original text before it reaches the image model, and in Azure OpenAI deployments you cannot disable it. The image model sees the rewrite, not your formula. Docs say it improves safety and quality, but research from UC Berkeley found that automatic LLM-based prompt revision actually cuts DALL-E 3’s image quality – measurably. If you need slot-level control, Midjourney or a Stable Diffusion-based tool is the better call: what you write is what the model sees.
2. You’re trying to match a reference image’s exact look. Text prompts hit a ceiling at “close enough.” Midjourney V7 (default since June 17, 2025) has a better answer: Omni Reference (--oref) for consistent characters or objects, and an improved Style Reference (--sref) for applying a moodboard aesthetic across prompts. A 12-word prompt plus an --sref URL will outperform a 200-word text description for aesthetic precision – in practice, at least.
FAQ
Does this formula work in Stable Diffusion and Leonardo too?
Yes, with a syntax difference. Stable Diffusion uses (word:1.3) for emphasis instead of Midjourney’s :: weighting. The conceptual structure – slot order, negative prompts, token priority – transfers directly.
How many words should a prompt be for the best results?
Aim for 30 to 75 words, per community-tested guidance. Below 30, the model fills too many blanks with defaults – you get a generic result. Above 75, later tokens start dropping to 0.00 weight and get ignored. If you must go longer, run the prompt through /shorten (Midjourney) first and check whether your most important descriptors survived with usable weight.
Why do my outputs look different every time I run the same prompt?
Diffusion models start from random noise, and the seed changes with every generation unless you lock it. Common misconception: people assume the same prompt guarantees similar results. It doesn’t – you’re comparing a slightly different random starting point each time. In Midjourney, --seed [number] locks the starting noise. Tweak one slot, keep the seed, regenerate – now you can actually see what your change did instead of attributing every difference to the edit you just made.
Next step: Take any prompt you’ve used recently. Paste it into Midjourney and run /shorten before generating. Look at the token weights. Anything at 0.00 – rewrite it, move it forward, or delete it. That single audit will probably improve your next image more than any new formula will.