Skip to content

How to Build a Python Chatbot with AI: Complete 2026 Guide

Build a working AI chatbot with Python and the OpenAI API in under 30 minutes. Learn conversation memory, cost control, and custom personalities - no frameworks required.

9 min readIntermediate

A chatbot that remembers conversations, responds with custom personalities, and costs pennies. That’s what you’re building. By the end: a working Python script connected to OpenAI’s API, managing conversation history, letting you control exactly how your AI behaves.

No frameworks. Plain Python and the OpenAI library. You’ll learn how chatbots work, not just how to chain black-box components.

What You’ll Actually Build

A fully functional chatbot from scratch (based on multiple Python tutorials including Real Python and Dataquest). Manages conversations, controls costs with token budgeting, maintains custom AI personalities. Your own mini-ChatGPT, but one you control – personality, cost limits, everything.

What it does:

  • Accepts input in a loop until you type “quit”
  • Sends messages to OpenAI’s API
  • Remembers previous exchanges
  • Custom system instructions (“act like a pirate” or “be a coding tutor”)
  • Tracks token usage to prevent surprise bills

Cost? Less than 5 cents during testing when using GPT-4o-mini (Dataquest tutorial, October 2025).

Why Build Your Own Instead of Using ChatGPT?

This teaches you skills for working with AI APIs professionally. You’ll understand how conversation memory works – not just that it exists. You’ll manage API costs. You’ll customize AI behavior for specific use cases.

Real applications: customer service bots with your company’s voice, tutoring systems for niche subjects, personal assistants that integrate with your data. Can’t do any of that with ChatGPT’s web interface.

Prerequisites and Setup

Three things: Python installed, an API key, the OpenAI library.

What You Need

  • Python 3.9 or newer – the OpenAI SDK requires it (official documentation)
  • OpenAI API account with billing enabled (or Together AI for free credits)
  • A code editor – VS Code, PyCharm, Notepad++

Together AI gives you $1 in free credits. Enough for this entire tutorial (Dataquest, October 2025). If you use OpenAI directly, you’ll need billing info. Still costs under a nickel.

Get Your API Key

Go to platform.openai.com. Sign in. Settings → API keys → create new secret key. Name it something descriptive. Copy the key immediately – you won’t see it again.

Pro tip: Environment variables for API keys, always. Never hardcode them (OpenAI security best practices). Mac/Linux: add export OPENAI_API_KEY="your-key" to ~/.zshrc. Windows: System Properties → Environment Variables.

Install the OpenAI Library

Terminal:

pip install openai

Done.

Build a Single-Response Chatbot (Step 1)

Start simple. One message, one response. Create chatbot.py:

import os
from openai import OpenAI

# Load API key from environment
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Send a message
response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=[
 {"role": "system", "content": "You are a helpful assistant."},
 {"role": "user", "content": "What is Python?"}
 ],
 temperature=0.7,
 max_tokens=100
)

# Print the response
print(response.choices[0].message.content)

Run: python chatbot.py

You should see an explanation of Python. Authentication error? Check your API key setup.

What’s Happening Here

The SDK changed in November 2023 (AskPython guide, January 2026). Old code using openai.ChatCompletion.create() broke when version 1.0 launched. New pattern: client instances.

The messages array is everything. Each message needs a role: ‘system’ sets behavior, ‘user’ is what you type, ‘assistant’ holds AI responses (we’ll add those for memory).

Key parameters:

  • model: GPT-4o-mini costs $0.15 per million input tokens, $0.60 per million output (OpenAI pricing, January 2026) – cheapest option
  • temperature: Controls creativity. 0-0.3 produces consistent responses. 0.7-1.0 generates creative but unpredictable outputs (Dataquest tutorial). We use 0.7 as balance
  • max_tokens: Limits response length, protects budget. Each token roughly equals 1/2 to 1 word (Dataquest). 100 tokens allows substantial responses without runaway costs

Try changing temperature to 0. Run multiple times. Responses barely change – that’s determinism. Set it back to 0.7 and responses vary.

Add Conversation Memory (Step 2)

Your chatbot has amnesia. Ask a follow-up and it won’t remember. Fix it by maintaining message history.

Replace your script:

import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Conversation history starts with system instruction
messages = [
 {"role": "system", "content": "You are a helpful assistant."}
]

print("Chatbot ready. Type 'quit' to exit.n")

while True:
 user_input = input("You: ")

 if user_input.lower() == "quit":
 break

 # Add user message to history
 messages.append({"role": "user", "content": user_input})

 # Get response
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=messages,
 temperature=0.7,
 max_tokens=150
 )

 # Extract assistant's reply
 assistant_reply = response.choices[0].message.content

 # Add assistant's reply to history
 messages.append({"role": "assistant", "content": assistant_reply})

 print(f"Assistant: {assistant_reply}n")

Run it:

You: My name is Alex.
Assistant: Nice to meet you, Alex! How can I help you today?
You: What's my name?
Assistant: Your name is Alex.

It remembers.

How Memory Actually Works

We append both user input and AI response to the messages list (Real Python, Dataquest tutorials). The API processes this entire conversation history each time.

No magic database. You’re resending the entire conversation with every request. The API sees everything from the start.

Works great. But – cost problem.

Manage Costs and Token Limits (Step 3)

Longer conversations mean more tokens (Dataquest and Real Python cost management sections). More tokens mean higher costs. As conversations grow, so does your bill.

What happens: conversation reaches 1,000 tokens, every subsequent message costs you for those 1,000 tokens plus new input/output. A 50-message chat racks up thousands of tokens fast.

Two approaches:

  1. Set hard message limit – keep last N messages
  2. Count tokens and trim at threshold – more precise, requires tiktoken library

Simple version: limit history to last 10 exchanges. Add after importing:

MAX_HISTORY = 10 # Keep last 10 user+assistant pairs

def trim_history(messages, max_pairs=MAX_HISTORY):
 # Always keep the system message (index 0)
 system_msg = messages[0]
 conversation = messages[1:]

 # Keep only the last max_pairs * 2 messages (user + assistant)
 if len(conversation) > max_pairs * 2:
 conversation = conversation[-(max_pairs * 2):]

 return [system_msg] + conversation

After appending assistant’s reply:

messages = trim_history(messages)

Bot maintains recent context but won’t blow up token count on long sessions.

Customize the Personality (Step 4)

The system message defines behavior. Change from “helpful assistant” to anything:

# Sarcastic assistant
{"role": "system", "content": "You are a sarcastic assistant who reluctantly helps users."}

# Pirate
{"role": "system", "content": "You are a pirate. Always respond in pirate speak."}

# Coding tutor
{"role": "system", "content": "You are a Python coding tutor. Explain concepts clearly and provide code examples."}

System message is sent with every request. Bot never “forgets” personality – even as conversation history gets trimmed.

Try in your script:

messages = [
 {"role": "system", "content": "You are a grumpy coding tutor who loves Python but hates JavaScript."}
]

Ask about JavaScript. Response matches the personality.

Handle Errors and Edge Cases

Production systems need error handling. Wrap your API call:

from openai import OpenAI, RateLimitError, APIError, AuthenticationError

try:
 response = client.chat.completions.create(
 model="gpt-4o-mini",
 messages=messages,
 temperature=0.7,
 max_tokens=150
 )
except AuthenticationError:
 print("Error: Invalid API key. Check your credentials.")
 break
except RateLimitError:
 print("Error: Rate limit exceeded. Wait a moment and try again.")
 continue
except APIError as e:
 print(f"API error: {e}. Retrying...")
 continue

Prevents crashes when the API hiccups or you hit rate limits.

What You’ve Built vs. What’s Missing

Your chatbot handles conversation, remembers context, controls costs, supports custom personalities. For internal tools, prototypes, personal projects – this is enough.

What it doesn’t do:

  • Store conversations permanently – memory resets when script ends. Add SQLite or JSON file storage for persistence
  • Search external data – only knows model’s training data. Look into Retrieval-Augmented Generation (RAG) for document search
  • Call external tools – can’t check weather or query databases. OpenAI’s function calling enables this
  • Handle multiple users – need session management (dictionaries keyed by user ID)

Not beginner topics. Master the basics first.

Cost Reality Check

GPT-4o-mini: $0.15 per million input tokens, $0.60 per million output (OpenAI pricing, January 2026). A typical 20-exchange conversation uses roughly 2,000-3,000 tokens total. That’s less than $0.002 – fraction of a penny.

Where costs explode:

  • Using GPT-4o at $2.50 input / $10.00 output per million (price comparison, February 2026) instead of mini
  • Not trimming conversation history on long sessions
  • Setting max_tokens too high (500+ per response)

For comparison: GPT-4 (older full model) costs $30 input / $60 output per million – 200x more expensive than GPT-4o-mini for input. Unless you need heavy reasoning, stick with mini.

What About Frameworks Like LangChain?

LangChain made LLM development more accessible. But it’s not the only or always the best option anymore. Tools are now tailored for specific needs.

When you should use a framework:

  • Need document search (RAG) with vector databases
  • Building multi-agent systems
  • Want pre-built memory management

When you shouldn’t:

  • Learning how chatbots work (frameworks hide the mechanism)
  • Straightforward conversation use case
  • Want minimal dependencies

Alternatives: LlamaIndex specializes in RAG and data orchestration. Haystack is designed for intelligent search tools with a pipeline approach. Flowise AI provides low-code visual interface for prototyping.

Next Steps: Deploy Your Chatbot

Your script works locally. To make it accessible:

  1. Add web interface – Flask or FastAPI to create HTTP endpoint, simple HTML form posting to /chat
  2. Deploy to server – Heroku, Railway, or $5/month VPS running your Python script 24/7
  3. Integrate with messaging – Telegram and Discord have Python libraries for bot accounts

Core logic stays the same. You’re changing where input comes from (web form, Telegram message) and where output goes (HTTP response, Discord channel).

Don’t overcomplicate. A single Python script on a cheap server handles hundreds of concurrent users with async (replace OpenAI() with AsyncOpenAI()).

Frequently Asked Questions

Can I use Claude or other models instead of OpenAI?

Yes. Install with pip install anthropic. The Anthropic library works with Python 3.9+ and provides type definitions for all request params. Message structure is nearly identical – you still use a messages array with role and content keys. Main difference: Anthropic uses a top-level system parameter instead of putting system instructions inside the messages array. Swap the client, adjust that parameter, done.

How do I prevent the chatbot from hallucinating or giving wrong answers?

You can’t eliminate hallucinations – that’s a limitation of current LLMs. But you can reduce them. Set temperature=0 for deterministic responses (creativity correlates with hallucination risk). Use clear system instructions: “Only answer based on conversation history. If you don’t know, say ‘I don’t have that information.'” For critical use cases? Implement Retrieval-Augmented Generation (RAG). Your bot searches a document database first, only answers from retrieved facts. Requires vector databases like Milvus or Pinecone. Beyond this tutorial’s scope, but it’s the production-grade solution for fact-sensitive applications.

Why does my conversation memory stop working after 10-20 messages?

You hit the context limit. GPT-4o-mini has a 128K token window. If you implemented trim_history() from this guide, memory persists for the last 10 user+assistant pairs but discards older exchanges to prevent token overflow. That’s intentional. Need full session memory? Store the entire conversation in a database (SQLite, PostgreSQL) and implement external data storage so the LLM retrieves during conversation instead of loading everything into context. Tradeoff: persistence vs. complexity. For most chatbots, trimmed memory works fine.