ChatGPT for Python: Build Working Code (Not Just Demos)

Learn to integrate ChatGPT's API into Python projects. Covers API setup, cost traps, code generation patterns, and when ChatGPT fails - with working examples.

Jack Tom2026-02-2110 min readBeginner

You’ll have a Python script that sends requests to ChatGPT’s API, processes responses, and handles errors without burning through your budget. Plus you’ll know when not to use ChatGPT for coding – something most tutorials skip.

What we’re building: a Python program that takes a coding problem as input, asks ChatGPT to solve it, tests the generated code automatically, and re-prompts if the first attempt fails. Real integration, not “hello world.”

The Cost Trap Nobody Mentions

The generous free trial that every tutorial from 2023-2024 references? Gone. OpenAI discontinued free credits in mid-2025. The current free tier: 3 requests per minute with GPT-3.5 Turbo. One request every 20 seconds. Enough to confirm your API key works, not much else.

GPT-4o (the model you actually want) costs $2.50 per million input tokens and $10 per million output tokens as of February 2026, according to OpenAI’s pricing page. A typical code generation request: 500 input tokens (your prompt) + 1,000 output tokens (generated code) = roughly $0.01 per request. You’re debugging in a loop and hit 100 requests in an afternoon? That adds up.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Best For
GPT-3.5 Turbo	$0.50	$1.50	Simple, fast responses
GPT-4o	$2.50	$10.00	Complex code, reasoning
GPT-4o Mini	$0.15	$0.60	Budget-friendly alternative

Minimum buy-in: $5. Experimenting? Start with GPT-4o Mini – 15× cheaper than GPT-4o and handles most Python tasks fine.

Setup: The Annoying Parts First

Python 3.9 or newer. The official OpenAI library dropped support for 3.8 and earlier in 2025. Check your version:

python --version

Install the library (use pip3 if you’re on Python 3.x):

pip install openai

Create an OpenAI account at platform.openai.com, add a payment method (yes, even for the “free” tier you need a card on file), then generate an API key from the dashboard under “API keys.” Copy it immediately – you can’t view it again later.

Store the key as an environment variable. Never hardcode it. Mac/Linux:

export OPENAI_API_KEY='your-key-here'

Windows (Command Prompt):

set OPENAI_API_KEY=your-key-here

The library reads this automatically. Hardcode your key and accidentally commit it to GitHub? Bots will find it in minutes and rack up charges on your account. This happens constantly.

Your First Working Request

from openai import OpenAI

client = OpenAI()
# API key is read from environment variable automatically

response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[
 {"role": "system", "content": "You are a Python expert who writes clean, well-commented code."},
 {"role": "user", "content": "Write a function to reverse a string in Python"}
 ],
 temperature=0.2
)

print(response.choices[0].message.content)

Run it. You should get a function definition back in seconds.

A few things here. The messages array has two roles: system sets the AI’s behavior (the instruction manual), user is your actual request. temperature controls randomness – 0 means deterministic output, 1 means maximum creativity. For code generation, keep it low (0.2-0.3) for consistent, predictable results.

Old tutorials use openai.ChatCompletion.create(). Pre-1.0 API. Doesn’t work anymore. New version (v1.x+) uses the client pattern above.

When ChatGPT Writes Bad Code

ChatGPT generates functionally correct Python code 66% of the time, per a 2024 ACM study that tested 4,066 programs. That’s based on older, well-documented problems. And there’s more.

Performance on coding tasks introduced after mid-2021 (its training cutoff) drops from over 50% to near 10% for hard-level problems. Asking it to use a library released in 2024 or solve a problem structure it hasn’t seen? Expect wrong answers.

Pro tip: Always test generated code before using it. Ask ChatGPT to include test cases in its response, then run them. If it passes, great. If not, feed the error message back and ask it to fix the issue.

ChatGPT also hallucinates – it’ll suggest Python packages that don’t exist, or use functions with incorrect signatures. If a generated import statement fails, that’s your first red flag.

Ever notice how it nails the easy stuff but chokes on edge cases? Empty inputs, None values, negative numbers – these trip it up constantly. You can work around this: be explicit in your prompt (“Handle empty strings”, “Validate inputs before processing”). But you’re still testing everything it gives you.

Building the Auto-Tester

Here’s a script that generates code, tests it, and re-prompts if it fails:

from openai import OpenAI
import subprocess
import sys

client = OpenAI()

def ask_chatgpt(prompt, temperature=0.2):
 """Send a prompt to ChatGPT and return the response."""
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[
 {"role": "system", "content": "You are a Python expert. Write only code, no explanations."},
 {"role": "user", "content": prompt}
 ],
 temperature=temperature
 )
 return response.choices[0].message.content

def test_code(code):
 """Run generated code and return True if it executes without errors."""
 try:
 exec(code)
 return True, None
 except Exception as e:
 return False, str(e)

# Main workflow
problem = "Write a function that finds the longest palindrome in a string"
code = ask_chatgpt(problem)

print("Generated code:")
print(code)

success, error = test_code(code)

if not success:
 print(f"nError detected: {error}")
 print("Asking ChatGPT to fix it...n")

 fix_prompt = f"This code has an error:n{code}nnError message: {error}nnFix it."
 fixed_code = ask_chatgpt(fix_prompt)

 print("Fixed code:")
 print(fixed_code)

 success, error = test_code(fixed_code)
 if success:
 print("n✓ Fixed code runs successfully")
 else:
 print(f"n✗ Still failing: {error}")
else:
 print("n✓ Code runs successfully")

This is closer to how you’d actually use ChatGPT in a real project. Generate, test, fix, repeat. You could extend this with unit tests, code style checks, or save successful solutions to a file.

One catch: the API doesn’t execute code itself. It only returns text. Want ChatGPT to test code as it writes? You need the web interface with Code Interpreter (ChatGPT Plus, $20/month). API and web version are different tools with different capabilities.

Error Patterns to Watch For

ChatGPT tends to fail in predictable ways:

Off-by-one errors in loops – especially with slicing and range()
Incorrect assumptions about input types – it might assume a string when you pass a list
Missing edge case handling – empty inputs, None values, negative numbers
Overcomplicated solutions – sometimes it writes 20 lines when 5 would work

Spot these patterns? Refine your prompt. “Handle empty strings,” “Use a list comprehension,” “Validate inputs before processing.”

Prompt Engineering (Without the Hype)

Output quality depends entirely on your prompt. Small changes produce completely different code – researchers tested this with GPT-4 and found the same problem described in slightly different words generated completely different implementations.

What actually works:

Be specific about constraints. “Write a function” is vague. “Write a function that takes a list of integers, returns the sum, and raises ValueError if the list is empty” is better.
Specify the Python version. “Use Python 3.10+ syntax” avoids outdated patterns.
Request documentation. “Include docstrings and type hints” saves you work later.
Ask for examples. “Provide 3 test cases” forces ChatGPT to think through edge cases.

For debugging, paste the error traceback directly into your prompt. ChatGPT is good at interpreting error messages – better than Stack Overflow search in many cases.

What ChatGPT Actually Can’t Do

It cannot access your codebase. Need it to refactor an existing 10,000-line project? You’re out of luck unless you paste relevant snippets (within the token limit). No context beyond what you provide in the current conversation.

No live data. No database queries, no API calls, no real-time information. Everything it knows comes from training data with a cutoff date (currently June 2024 for most models).

It cannot make judgment calls about business logic. Doesn’t know what your stakeholders want, what “good enough” means for your project, or whether a 10ms optimization matters. That’s your job.

And it cannot verify its own output is correct. API returns text. Period. You must test everything.

Cost Control Strategies

Track your usage in the OpenAI dashboard under “Usage.” Set a monthly spending limit under “Billing → Limits” to avoid surprises.

Keep prompts concise to reduce input tokens. Sending a 5,000-token system prompt with every request? Try a service like Anthropic’s Claude API, which offers prompt caching (OpenAI doesn’t, as of early 2026). According to API cost analyses, caching can cut repeated prompt costs by 90%.

Use GPT-4o Mini for most tasks. Save GPT-4o for complex problems where the cheaper model fails. You can always escalate to a better model if needed – start cheap.

Batch requests if you’re processing many similar tasks. The Batch API offers 50% off but returns results within 24 hours instead of real-time. Good for non-urgent work.

When NOT to Use ChatGPT for Python

If you’re learning Python for the first time, don’t rely on ChatGPT to write everything. You won’t learn the language. Use it to explain concepts or debug errors, but write the code yourself.

If your code handles sensitive data, don’t send it to the API. OpenAI’s terms state they don’t use API data for training (as of 2023), but you’re still sending your code to a third party. If that violates your company’s security policy, don’t do it.

If you need code that’s legally defensible, watch out. ChatGPT pulls from a massive corpus of training data, some of which may be copyrighted. There’s ongoing litigation around this. Building a commercial product? Have a lawyer review your AI-generated code policy.

If the problem requires deep domain expertise – statistical analysis with specific assumptions, medical algorithms, financial calculations – ChatGPT will give you wrong answers. It doesn’t know what it doesn’t know.

Your Next Steps

Take the code examples from this tutorial and adapt them to a real problem you’re trying to solve. Start small: automate a tedious task, generate boilerplate code, or build a script that drafts unit tests for your functions.

Set a spending limit in the OpenAI dashboard before you start experimenting. $5 goes far if you’re using GPT-4o Mini.

Track what works and what doesn’t. Keep a log of prompts that generated good code versus garbage. You’ll start noticing patterns – certain phrasings work better, certain problem types fail consistently.

Read the official API documentation for advanced features like function calling, streaming responses, and fine-tuning. There’s more you can do once the basics click.

ChatGPT is a tool, not a replacement. It writes code faster than you can Google for solutions, but it doesn’t understand why the code works. That part’s still on you.

Frequently Asked Questions

Is the OpenAI API free for Python developers?

No. Free trial credits were discontinued in mid-2025. The free tier exists but is limited to 3 requests per minute with GPT-3.5 Turbo only – basically useless for real development. You’ll need to add a payment method and spend at least $5 to enable full access.

Can ChatGPT debug my Python code if I paste the error message?

Yes, and it’s effective. Paste your code and the full traceback into your prompt, then ask it to identify and fix the issue. It handles common errors (syntax mistakes, type mismatches, import problems) well. Complex logic bugs or issues specific to your codebase are hit-or-miss. One debugging session I ran burned through 47 requests before we found the real issue – a timezone conversion bug that ChatGPT kept misdiagnosing as a datetime parsing problem. Always test the suggested fix before trusting it.

What’s the difference between using ChatGPT’s web interface versus the API for Python coding?

The web interface (ChatGPT Plus, $20/month) includes Code Interpreter, which can execute Python code in a sandbox and see the results. The API cannot – it only generates code as text. For learning or quick experimentation, the web interface is better. For integrating AI into your own applications or automating workflows, you need the API. Different tools, different trade-offs. Web: conversational, can test code. API: programmable, blind to execution results. If you’re building a product that needs AI code generation, you’re using the API. If you’re debugging a one-off script at 2am, the web interface is faster.