The number one mistake people make with this topic: they think they’re training ChatGPT. They’re not. When you upload a PDF to a Custom GPT or paste a doc into custom instructions, ChatGPT’s underlying weights don’t change at all. You’re stuffing context into the model’s prompt at runtime – that’s retrieval, not training.
Real training (fine-tuning) means modifying the model’s parameters via OpenAI’s fine-tune API, and almost nobody who searches “how to train ChatGPT on your own data” actually wants that. Once you understand which one you need, the decision gets simple.
The key takeaway, upfront
If your data is documents, FAQs, or knowledge that changes – use a Custom GPT (which is RAG under the hood). If you need consistent style, format, or tone the model can’t hold via prompts – use fine-tuning. If you don’t know which, it’s almost always RAG. Databricks puts it directly: RAG is the better choice when information changes frequently; fine-tuning is better when the model needs to learn a new style or follow domain conventions.
What ChatGPT actually does with your data
According to OpenAI’s official documentation, machine learning models consist of large sets of numbers – weights or parameters – along with code that interprets them. These models don’t store copies of training data. As a model learns, its parameter values adjust to reflect patterns it has identified. That’s training in the technical sense.
What a Custom GPT does is completely different. Your uploaded files sit in a vector store. When someone asks a question, relevant chunks get retrieved and pasted into the system prompt before the model responds. The model itself never changes. This matters because it explains every limitation you’ll hit later – file caps, retrieval misses, hallucinations on data that is in the file but didn’t get retrieved.
RAG (Custom GPT) vs Fine-tuning: the honest comparison
With numbers, as of November 2025:
| Factor | Custom GPT (RAG) | Fine-tuning GPT-4o |
|---|---|---|
| What changes | Nothing – data injected at query time | Model weights updated permanently |
| Data limits | 20 files, 512MB each, 2M tokens each | Min ~10 examples; 200-300 for production |
| Cost | ChatGPT Plus ($20/mo) | $25 per 1M training tokens + higher inference |
| Update speed | Re-upload a file, done | Re-train (hours, $$) |
| Best for | Knowledge, docs, FAQs | Style, format, tone, structured outputs |
| Risk | Retrieval misses | Catastrophic forgetting |
The inference cost trap is the one almost nobody flags upfront. File limits come from OpenAI’s File Uploads FAQ (verified November 2025). On pricing: fine-tuning GPT-4o training runs $25 per 1M tokens; inference on a fine-tuned model is $3.75/1M input tokens and $15/1M output tokens – roughly 50% more than base GPT-4o, every single call, forever. Build a cost model before you commit.
For 90% of “train ChatGPT on my data” use cases, the Custom GPT wins. Here’s how to make one that actually works.
The Custom GPT walkthrough that actually matters
Skipping the click-here-click-there steps (every other tutorial covers those). What determines whether your Custom GPT is useful or useless: how you prepare and chunk the knowledge files.
- Pick the 20 files that matter. As of November 2025, OpenAI caps Custom GPTs at up to 20 knowledge files for the lifetime of that GPT – and that limit is hard. You can’t pay your way out of it. Consolidate aggressively. One well-structured 400-page PDF beats 20 scattered Word docs.
- Strip noise before uploading. Headers, footers, page numbers, table-of-contents bloat – all of it pollutes retrieval. Plain text or Markdown retrieves more reliably than PDF, because PDF text extraction is messy and the retriever doesn’t see what you see.
- Stay under the per-file ceilings. Hard limit: 512MB per file and 2 million tokens per text/document file (as of November 2025). CSVs are stricter – approximately 50MB depending on row size. Hit any of these and the upload silently fails or the file gets truncated.
- Write retrieval-friendly content. The retriever uses semantic similarity. If your FAQ asks “How do I cancel?” the doc better contain words like “cancel,” “cancellation,” “end subscription.” Write the way users ask, not the way lawyers wrote your policy.
- Test by trying to break it. Ask edge questions. Paraphrased questions. Questions whose answers are buried mid-document. If retrieval fails on these, your chunks are too long or your filenames are doing too much work.
Pro tip: Name your knowledge files with the topic in the filename itself (e.g.,
refund-policy-2026.md, notdoc_final_v3.pdf). The retriever weighs filename matches heavily, and a clean filename can rescue retrieval when chunk content is ambiguous.
One thing worth sitting with: retrieval-augmented generation sounds like it should be reliable – you gave it the document, it should find the answer. In practice, the failure mode is subtler. The model retrieves the wrong chunk because your question and the answer share no overlapping vocabulary. That’s not a ChatGPT bug. That’s a document structure problem. Which is why step 4 above – writing the way users ask – matters more than any other prep step.
When fine-tuning is actually right
Fine-tune when the issue is behavior, not knowledge. Three real signals:
- You need outputs in a strict JSON schema and prompting keeps drifting.
- You have 200+ input/output pairs demonstrating a writing style or format that’s hard to describe in words.
- You’re paying for huge system prompts on every call and want to push that style into the weights to shrink prompt cost.
The numbers (as of November 2025): OpenAI requires at least 10 examples to start fine-tuning – but for production chatbots, 200 to 300 examples covering the full range of user questions is the practical floor. Default training runs 4 epochs. That multiplies your token count by 4 for billing – a 50,000-token dataset costs 200,000 tokens of compute, not 50,000.
The success cases are real but narrow. A fine-tuned GPT-4o powering Cosine’s Genie scored 43.8% on the SWE-bench Verified benchmark (announced by OpenAI in 2024) – a coding agent built on thousands of training examples, not someone uploading their company handbook. Fine-tuned models also give you full ownership of business data including all inputs and outputs, per OpenAI’s fine-tuning announcement, which matters in some compliance contexts.
Sample JSONL line for the fine-tune API:
{"messages": [
{"role": "system", "content": "You are a support agent for Acme Corp."},
{"role": "user", "content": "My order hasn't arrived."},
{"role": "assistant", "content": "I'm sorry to hear that. Could you share your order number so I can check the status?"}
]}
Build hundreds of these. Submit via the API. Wait. Pay. Then ask whether it actually outperforms a well-prompted base model – because frequently it doesn’t. That’s not cynicism; it’s the honest question OpenAI’s own docs encourage you to test before scaling.
Here’s what’s genuinely unclear in the current docs: there’s no public benchmark showing how many training examples are needed before a fine-tuned GPT-4o reliably beats GPT-4o + a detailed system prompt on general Q&A tasks. The 200-300 example recommendation comes from practitioners, not from a controlled study. If you’re deciding between spending a week building a great prompt versus spending money on fine-tuning, that uncertainty is worth factoring in.
Edge cases the other tutorials skip
You can’t edit a knowledge file in place. Delete it, re-upload it – that’s the only path. Now factor in the rolling upload cap: 80 files every 3 hours (as of November 2025). A busy session where you’re iterating on five files – deleting and re-uploading each one multiple times – can push you against that ceiling faster than expected. The rate limit doesn’t distinguish between first uploads and replacements.
Catastrophic forgetting on fine-tunes. Fine-tuned models often lose general capabilities present in pre-training – a finance-focused fine-tuned model may stop handling everyday conversational tasks well. If your prompt mixes domain Q&A with casual chitchat, fine-tuning can silently degrade the chitchat quality. RAG doesn’t have this failure mode because the underlying model never changes.
Storage caps stack across your account. Each end-user is capped at 25GB and each organization at 100GB (as of November 2025 – verify current limits in your account settings). Build five Custom GPTs with overlapping knowledge files and your quota drains faster than you’d think. Delete old GPTs if you’re not using them.
Free plan: use GPTs, can’t build them. Users on a free plan can access GPTs built by others, but building a Custom GPT requires a paid plan.
FAQ
Does training a Custom GPT mean OpenAI uses my data to train their public models?
No – knowledge files in your Custom GPT aren’t used to train OpenAI’s foundation models. Standard data controls still apply to your chat history, so check OpenAI’s data controls page if you’re handling sensitive material.
I have 500 documents to load. The 20-file limit kills me. What now?
Two options. Easiest: concatenate related docs into a small number of large Markdown files – you have 2M tokens per file to work with, which is a lot of text. Real option: move off Custom GPTs entirely and build directly on the Assistants API or a vector store via the OpenAI API, where file count limits are far higher. Most people who hit the 20-file wall need the API tier anyway. The Custom GPT builder is a convenience wrapper, not a production data platform.
Should I fine-tune and use RAG together?
Yes – production systems do this. Fine-tune for tone, format, or domain language; add a RAG layer for knowledge that changes. You get behavioral consistency from the weights and current information from retrieval. One specific thing to model before committing: you’ll be paying fine-tuned inference rates ($3.75/$15 per million tokens, as of November 2025) on every retrieval call. If your RAG layer is chatty – lots of output tokens per response – that inference premium compounds fast. Run the numbers against your expected call volume first, not after.
Your next step
Open ChatGPT, go to Explore GPTs → Create, and build one Custom GPT with exactly three carefully prepared knowledge files. Test it with five questions you know the answers to. If retrieval misses on any of them, your file structure – not the model – is the problem to fix first. Don’t even consider fine-tuning until you’ve spent real time on RAG and hit a wall it can’t solve.