Software Agents Before AI: How We Built Autonomy 30 Years Ago

Everyone's building AI agents in 2025 - but intelligent agents existed decades before ChatGPT. Here's what developers in the '90s knew that today's builders are rediscovering.

Jack Tom2026-03-219 min readBeginner

Everyone’s talking about AI agents in 2025 like they just dropped from the future. OpenAI releases Operator. Anthropic ships Claude with computer use. Google announces Gemini agents. The narrative: this changes everything.

Here’s what nobody’s saying out loud: intelligent agents existed 30 years before ChatGPT.

The problem this solves? Understanding what’s genuinely new versus what we forgot. Because if you’re building agents in 2025 without knowing why the first wave failed in 2001, you’re about to repeat expensive mistakes.

The Agent That Launched a Thousand Memes (and Failed Anyway)

Let’s start with the most famous pre-AI agent failure: Microsoft Clippy.

Launched in 1997, Clippy wasn’t some simple script. According to research on early intelligent agents, it used Bayesian algorithms to predict user intent and offer contextual help. When you started typing “Dear Sir,” Clippy popped up: “It looks like you’re writing a letter. Would you like help?”

Sophisticated tech. Terrible execution.

The failure wasn’t the algorithm – it was the interruption timing. Clippy couldn’t learn from individual user preferences. It offered elementary advice when you needed advanced help. It interrupted during flow states. Users didn’t hate agents; they hated this agent’s inability to read the room.

Sound familiar? Modern AI agents face the same challenge: autonomy without context-awareness is worse than no agent at all.

What “Agent” Meant Before LLMs

Here’s the taxonomy researchers used in 1991, decades before GPT existed.

Brustoloni’s classification split software agents into three types:

Regulation agents – Reactive systems that respond to each input without planning or learning. Think: thermostats, simple reflex programs.
Planning agents – Systems that solve problems using predefined methods: case-based reasoning, operations research algorithms, or rule-based logic.
Adaptive agents – Programs that learn from experience and adjust behavior over time.

Notice something? These categories still describe 2025 agents. A customer service chatbot that follows a script? Regulation agent. A research assistant that chains tool calls? Planning agent. An LLM fine-tuned on user feedback? Adaptive agent.

The concepts aren’t new. The execution quality is.

Pro tip: When evaluating modern agent frameworks, ask which type you’re actually building. Most “AI agents” are glorified planning agents with LLM reasoning bolted on. True adaptive agents – systems that improve from interaction data without retraining – are still rare in production.

The Forgotten Boom: Agent Webs (1998-2003)

Most AI agent tutorials start with ELIZA (1966) and jump straight to GPT-3 (2020). They skip the most relevant chapter: the agent web era.

In the late 1990s, researchers thought autonomous agents would transform e-commerce. Systems emerged that could:

Crawl the web for price comparisons
Book travel based on user preferences
Filter email and schedule meetings
Negotiate with other agents for resources

The vision looked a lot like 2025’s AgentOps pitch.

What happened? According to recent analysis of intelligent automation history, these first-generation agents collapsed because they were “limited by computational power, data availability, and, crucially, a lack of sophisticated orchestration.”

They were isolated. They couldn’t collaborate. They lacked the ability to adapt to complex scenarios.

Now compare that to the problems enterprises face deploying AI agents today: orchestration across multiple agents, managing tool dependencies, preventing cascading failures. We’re debugging the same architecture gaps, just with better models underneath.

Building Agents in the ’90s: What It Actually Looked Like

Let’s get practical. How did developers build autonomous agents before LLMs existed?

Expert systems (1970s-1980s) combined a knowledge base with an inference engine. MYCIN, developed at Stanford, diagnosed bacterial infections and recommended antibiotics. It worked – expert systems captured human expertise through symbolic decision logic – but couldn’t handle anything outside their narrow domain.

Reinforcement learning agents (1988+) changed the game. Sutton and Barto’s temporal difference learning enabled agents to act in an environment, receive feedback, and adjust behavior to maximize reward. This wasn’t hardcoded logic; it was learned strategy.

The catch? Training was slow. Environments had to be simulated. Real-world deployment was nearly impossible until computing power caught up decades later.

Web crawlers and email filters (1990s) were the first agents most people actually used. Google’s PageRank bot was an agent: it perceived the web’s link structure, made decisions about page importance, and acted by indexing content. Your spam filter? Also an agent – it learned patterns from labeled data and took action (delete, quarantine, allow) without human intervention.

These weren’t dumb scripts. They exhibited autonomy, learned from data, and pursued goals. They just didn’t reason in natural language.

What Actually Changed in 2025

So if agents are old, what’s the fuss about?

Three things shifted:

1. Natural language reasoning replaced rigid planning.
Old agents: IF condition X THEN action Y.
New agents: “Here’s the goal, here are the tools, figure it out.”

LLMs can interpret ambiguous requests, plan multi-step workflows, and adapt when the first approach fails – all in the same conversation. That’s the enable.

2. Tool use became standardized.
Pre-2024, connecting agents to external APIs required custom integration code for every tool. Anthropic’s Model Context Protocol (released late 2024) standardized how LLMs connect to tools. Google’s Agent2Agent and other protocols followed. Now agents can discover and use tools dynamically, not just call hardcoded functions.

3. Orchestration frameworks matured.
The 2000s agent web failed because coordinating multiple agents was manual, brittle work. Today’s frameworks (LangGraph, AutoGen, Microsoft Agent Framework) handle state management, handoffs between agents, and fallback logic out of the box.

That’s what’s new. Not the concept of autonomy. Not the perception-action loop. The reasoning layer and the infrastructure to deploy it reliably.

Lessons from Old Agents That Still Apply

Here’s what developers in the ’90s learned the hard way – and what still matters in 2025:

All software agents are not autonomous agents. A payroll program isn’t an agent even though it processes input and produces output. Why? It doesn’t operate continuously, and its output doesn’t affect what it senses next. Real agents run in a loop: perceive, decide, act, repeat. If your “AI agent” is just an LLM API call wrapped in a function, it’s not an agent – it’s a tool.

Agent autonomy is a spectrum, not a binary. Early research defined levels from simple reflex (thermostat) to goal-based planning (chess AI) to learning agents (reinforcement learning systems). Modern AI agents sit somewhere in the middle: they plan and use tools, but true long-term learning without retraining is still rare. Know where your agent sits on that spectrum.

Interruption design matters more than capability. Clippy had the tech. It failed because it interrupted poorly. Your 2025 agent can be GPT-5-powered, but if it pings users at the wrong moment or takes actions without appropriate confirmation gates, adoption will tank. The best agent is often the quietest one.

When NOT to Use an Agent

This is the part most tutorials skip.

Agents add complexity. If your workflow is predictable and deterministic, a traditional script or automation tool (Zapier, n8n without agentic logic) is faster, cheaper, and more reliable.

Don’t use an agent if:

The task has a fixed, known sequence of steps every time
You can’t tolerate any errors or hallucinations
Latency matters more than adaptability (agents add reasoning overhead)
You’re automating high-risk actions without human-in-the-loop safeguards

Use an agent when the problem involves ambiguity, requires multi-step reasoning, or changes context frequently. Otherwise, you’re over-engineering.

Common Pitfalls (That Were Already Documented in 1998)

Agent loops. When an agent’s action triggers a condition that makes it take the same action again, forever. Old email filtering agents would sometimes auto-reply to auto-replies. Modern LLM agents loop when stop conditions aren’t explicit. Same bug, different decade.

Tool call explosions. An agent with access to 50 tools but no prioritization logic will waste tokens trying everything. The 1990s solution? Limit the action space and rank tools by relevance. Still the right answer in 2025.

The “just one more agent” trap. Multi-agent systems sound elegant but debugging N agents is exponentially harder than debugging one. Start with a single capable agent. Only split into multiple when coordination overhead is justified by specialization gains.

Performance: Then vs. Now

How good were pre-LLM agents?

DeepMind’s AlphaCode (2022, pre-LLM reasoning) reached the median level of competitive programmers by generating many candidate solutions and filtering aggressively. Not bad for a narrow system.

IBM Watson (2006) crushed human Jeopardy champions using statistical NLP and knowledge retrieval – no LLM involved.

Reinforcement learning agents mastered Go, Starcraft, and Dota 2 through self-play, outperforming world champions.

These were insanely capable within their domains. But they couldn’t generalize. Watson couldn’t book you a flight. AlphaCode couldn’t summarize a PDF.

LLM-powered agents trade some task-specific performance for generality. A GPT-4 agent won’t beat a purpose-built RL system at Go, but it can play Go, then book your travel, then debug your code – all in the same session. That flexibility is the trade.

What We’re Still Getting Wrong

The hype cycle is repeating.

In 2000, people thought agents would replace apps. They didn’t. In 2025, people think agents will replace apps and employees. History suggests the outcome will be subtler: augmentation, not replacement.

According to IBM’s 2025 analysis of agent expectations versus reality, “agents can do a few things well, but that doesn’t mean you can agentize any flow that pops into your head.” Early pilots are failing when companies give agents too much autonomy too fast.

The lesson from the ’90s: start narrow, prove reliability, then expand scope. Agents earn autonomy through testing, not hype.

FAQ

Were software agents before AI actually “intelligent”?

Depends on your definition. They exhibited autonomy, pursued goals, and adapted to feedback – all hallmarks of intelligence. But they couldn’t reason in natural language or generalize across domains. ELIZA seemed intelligent (the “ELIZA effect”) but was pure pattern matching. Expert systems like MYCIN made expert-level decisions within narrow domains. Intelligence is a spectrum; early agents sat on the lower end.

Why did the 1990s agent boom fail if the concepts were sound?

Three reasons: computational limits (training RL agents took forever), lack of orchestration tools (coordinating multiple agents was manual, brittle work), and overpromising (agents were pitched as general-purpose when they were domain-specific). The infrastructure wasn’t ready. Today’s cloud GPUs, standardized protocols like MCP, and mature orchestration frameworks solve those blockers – which is why the 2025 wave has better odds.

What’s the single biggest difference between 1990s agents and 2025 AI agents?

Natural language reasoning. Old agents could act and learn, but planning required hardcoded logic or narrow domain knowledge. LLMs can interpret ambiguous goals, devise multi-step plans, and adapt when initial approaches fail – all from a text prompt. That reasoning layer is what makes modern agents feel “general purpose” even when they’re not.