Ask ChatGPT a simple question and you get three paragraphs, two disclaimers, and a conclusion restating everything you just read. Ask with a 20-word prompt that includes “answer in under 50 words” and you get a direct answer that actually respects your time.
The difference? You controlled the output before it started rambling.
Most Tutorials Miss the Output Cap
GPT-4o: 128K context window, but output capped at 16,384 tokens. GPT-3.5? Only 4,096 output tokens despite a 16K context. The ChatGPT interface itself? Around 8,000 tokens per reply – even though the API allows much higher caps (as of January 2025, per OpenAI’s technical documentation).
You ask for a “detailed analysis.” The model has to fit the answer into that ceiling while sounding helpful. Result: filler, hedging, repetition.
Everyone complains about verbosity. The usual advice – “be brief” in your prompt, custom instructions – helps but misses this constraint causing mid-response cutoffs.
Custom Instructions: Set Once, Apply Forever
Available on all plans – Free, Plus, Pro – across Web, Desktop, iOS, Android (introduced mid-2023, rolled out to all users later that year). They apply to every chat automatically, so you stop repeating preferences.
Setup:
- ChatGPT Settings → Personalization (or Customize ChatGPT)
- Toggle “Enable customization” ON
- Second field (“How would you like ChatGPT to respond?”) – paste this:
Be direct and concise. Get to the point; minimize tokens.
Don't elaborate unless requested.
Don't be redundant or repetitive.
NEVER mention that you're an AI.
Avoid phrases like "sorry," "apologies," "I don't have access to," or "as a large language model."
If you don't know, say "I don't know" and stop.
For vague prompts, ask clarifying questions instead of guessing.
1,500-character limit per field. You won’t hit it with the above, but keep it tight if you add role context.
Pro tip: Add verbosity levels: “Adopt verbosity based on user settings. V=0 (minimal) to V=5 (maximum). If not specified, assume V=2.” Then include “V=1” in prompts where you want ultra-short answers.
It doesn’t work perfectly. Mid-2024 reports showed GPT-4o ignoring brevity rules set in custom instructions, reverting to long-form even with explicit “be concise” directives. Issue a mid-chat correction: “You’re getting wordy. Switch to compression mode for all future replies in this chat.”
Prompt-Level Controls
Custom instructions set the baseline. Individual prompts fine-tune on the fly.
Specify exact length. “In 3 sentences” beats “briefly.” ChatGPT interprets concrete constraints better than vague qualifiers. “Explain X in 2 paragraphs” or “List 5 bullet points” – the model has a target it can hit.
Frontload the constraint. Put length limits at the start: “Answer in under 50 words: [your question]” works better than “[your question]. Keep it under 50 words.”
Use roles to imply brevity. “You’re a busy CEO with 10 seconds. Explain the SMART framework.” The role primes compression without explicit “be concise.”
“Explain photosynthesis” → ~300 words. No constraint, model defaults to thorough.
“Explain photosynthesis briefly” → ~150 words. Qualifier interpreted loosely.
“Explain photosynthesis in 3 sentences” → ~50 words. Concrete target, easy to follow.
“You’re a busy executive. Explain photosynthesis in 10 seconds” → ~40 words. Role + time = double pressure to compress.
When Responses Cut Off Mid-Sentence
Happens when output hits the token ceiling. Two options:
Type “continue” immediately. ChatGPT picks up where it stopped – works for factual content but often loses thread on creative tasks.
Better: include this at the end of your prompt: “Don’t summarize or cut short. If you run out of tokens, stop and wait for me to say ‘Go’ before continuing.” Prevents the model from rushing to a premature conclusion when it senses the limit.
Code vs. Prose: The Asymmetric Output Cap
One user gave ChatGPT 319 lines of HTML and got back 95 lines – the model claimed that was the “entire page” despite discarding 60% of the input. Ask a conceptual question? 600 words when 50 would work.
ChatGPT applies aggressive truncation to code to avoid token waste. Pads prose responses to sound helpful. If you’re editing large files, split them into sections and process one at a time. For prose? Enforce output caps explicitly (“max 100 words”) or the model defaults to verbose.
Verbosity as Cost Management
Longer responses fill the context window faster, forcing users to start new chats sooner. That resets conversation memory and reduces server load – each follow-up message sends the entire history, making each subsequent message more expensive in tokens.
10-turn conversation with 500-word replies hits token limits faster than 20 turns with 100-word replies, but the former costs less to serve because it ends sooner. Verbose output isn’t just a UX annoyance – it’s cost management.
API Users: Hidden Settings
Using the OpenAI API directly? You have controls ChatGPT UI users don’t.
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Your prompt"}],
"max_tokens": 150,
"temperature": 0.5
}
max_tokens hard-caps output length. Set it to 150 and the response stops at 150 tokens regardless of content.
temperature (range 0.0-2.0) controls randomness. Lower values (0.3-0.5) produce terser, more deterministic output. Higher values (0.8-1.0): more creative but wordier. ChatGPT UI doesn’t expose this – you’re stuck with whatever default the model uses (usually around 0.7-1.0, as documented in OpenAI API docs).
UI gives you simplicity. API gives you precision.
Custom Instructions vs. Per-Prompt Control
Custom instructions save time if you want consistent behavior across all chats. Set once, every new conversation follows those rules. Works for general use.
Per-prompt controls? Flexibility when you need different output lengths. A research query might need V=5 (verbose), quick fact-check needs V=0 (minimal).
Best move: combine both. Use custom instructions to eliminate apologetic filler and AI self-references, then adjust verbosity per-prompt when the task demands it.
When Custom Instructions Stop Working
Sometimes they fail. Mid-2024 user reports showed GPT-4o ignoring brevity rules, reverting to long-form even with explicit “be concise” directives.
Three fixes:
Issue a mid-chat reset. Type: “You’re ignoring my brevity instruction. From this point forward, all responses must be under 100 words unless I specify otherwise.” Works 70% of the time.
Start a new chat. Custom instructions reapply cleanly at the start. If a thread has drifted into verbosity, reset.
Use a verbosity scale. Instead of binary (brief/verbose), give the model a 0-5 scale and call it explicitly. “V=1: List three causes of inflation.” This overrides custom instruction drift.
FAQ
Can I set different custom instructions for different chats?
No. They apply globally. Need different behavior for different projects? Use ChatGPT Projects (Plus/Pro) – project-specific instructions + reference files.
Why does ChatGPT still apologize even after I told it not to?
The model’s RLHF training heavily weighted politeness and hedging (per Google researchers’ October 2024 paper on LLM verbosity). Custom instructions push against that, but they’re not a hard override – suggestions the model weighs against base behavior. Apologies persist? Add a second line: “Refusals, disclaimers, and hedging waste tokens. Skip them.” Sometimes redundancy makes the instruction stick.
Does lowering output length hurt answer quality?
Sometimes. 30-word answer to a complex question? Superficial. But most of the time, verbosity is padding. Test: ask the same question with “answer in 50 words” and again with no constraint. Short version covers what you needed? Long version was filler. Feels incomplete? Raise the cap to 100-150 words and retest. Remember that V=0 to V=5 scale? Use it.