RubyLLM Tutorial: One Gem for Every AI Provider (2026)

RubyLLM unifies OpenAI, Claude, Gemini, Bedrock and more behind one Ruby API. Hands-on setup, agents, structured output, and the gotchas competitors skip.

Taylor Kim2026-06-258 min readBeginner

Hot take: a unified AI client is not automatically better than a vendor SDK. It’s better only when the provider differences you’re abstracting are the differences you actually want to ignore. RubyLLM understands this – and it’s currently lighting up Ruby Twitter and Hacker News because it hit a sweet spot that LangChain.rb missed.

If you’ve been writing Ruby and watching Python devs swap Claude, GPT, and Gemini with one config line, this one’s for you.

The scenario: you already wrote provider-specific code, and now you regret it

You picked ruby-openai six months ago. It worked. Then your team wanted Claude for long-context summarization. You wrote a second adapter. Then Gemini came out cheaper for vision. Third adapter. Now your codebase has three response shapes, three error hierarchies, and three places where streaming behaves differently.

This is the exact pain RubyLLM targets. It’s an open source Ruby gem – maintained by Carmine Paolino – that puts one consistent interface in front of OpenAI, Anthropic, Gemini, Bedrock, DeepSeek, Mistral, Ollama, OpenRouter, Perplexity, GPUStack, xAI, VertexAI, and any OpenAI-compatible API. Chat, multimodal inputs, image generation, embeddings, audio transcription, tools, agents, structured output, streaming, Rails integration: one gem, three runtime dependencies (Faraday, Zeitwerk, Marcel).

What’s actually new (and why the Ruby community is talking)

As of early 2026, the gem sits at roughly 3,700+ GitHub stars and is running in production at Chat with Work. Recent releases are what pushed it from “nice library” to “default choice” for a lot of teams.

Three changes matter most for new adopters:

Concurrent tool execution (released 2025, v1.16). When a model returns multiple tool calls in one response, RubyLLM used to run them one at a time. Now they run in parallel. Enable via config.tool_concurrency = :threads (or :fibers) – cuts real latency for I/O-bound tools like HTTP calls, database lookups, or chained LLM requests.
Universal proxy support. Every native provider accepts a custom api_base configuration. Route traffic through Azure, LiteLLM, or a local mirror without writing a custom adapter. (Source: dudarik.com community review, 2025.)
Normalized token accounting. Turns out, OpenAI, OpenRouter, Bedrock, and Gemini all disagree on what a “prompt token” even means in their raw responses. RubyLLM’s releases now normalize streaming and non-streaming usage across all four, separating cache reads/writes from standard input tokens before any cost calculation happens.

That last point is the one nobody else is writing about. If you’ve ever tried building a unified cost dashboard across two providers, you already know the headache. The normalization handles it silently.

Setup in two minutes (skip the bloat)

Most tutorials walk you through bundle add ruby_llm and the Rails generator. Here’s the part they skip: the multi-provider config that actually exercises the abstraction.

# config/initializers/ruby_llm.rb
require 'ruby_llm'

RubyLLM.configure do |config|
 config.openai_api_key = ENV['OPENAI_API_KEY']
 config.anthropic_api_key = ENV['ANTHROPIC_API_KEY']
 config.gemini_api_key = ENV['GEMINI_API_KEY']

 # Route OpenAI traffic through Azure or a LiteLLM proxy:
 # config.openai_api_base = ENV['OPENAI_BASE']

 config.tool_concurrency = :threads # parallel tool calls
end

Now switch models like you’d flip a config flag:

chat = RubyLLM.chat(model: 'gpt-4o')
chat.ask 'Summarize the Ruby 3.4 release notes'

# Same code, different brain:
claude = RubyLLM.chat(model: 'claude-3-5-sonnet-20241022')
claude.ask 'Now critique that summary'

That’s the whole pitch in eight lines. The same chat.ask accepts files too – per the GitHub README, you can pass images, video, audio, PDFs, and code files: chat.ask "What's in this image?", with: "ruby_conf.jpg". Multiple files? Pass an array.

Building something real: a currency-converting agent

Every other tutorial shows you the Weather tool from the README. Here’s a different angle – a small currency agent that calls a real exchange-rate API inside a chat loop.

class CurrencyConvert < RubyLLM::Tool
 desc 'Convert an amount between two ISO currency codes'

 def execute(amount:, from:, to:)
 url = "https://api.frankfurter.app/latest?amount=#{amount}&from=#{from}&to=#{to}"
 JSON.parse(Faraday.get(url).body)
 end
end

class FinanceAssistant < RubyLLM::Agent
 model 'gpt-4o'
 instructions 'Use the tool for every conversion. Never guess rates.'
 tools CurrencyConvert
end

FinanceAssistant.new.ask 'How much is 350 EUR in JPY and SGD?'

Two tool calls. With tool_concurrency = :threads set earlier, they run in parallel – a free latency win you don’t get from most provider SDKs by default.

One thing worth knowing about empty tool results: don’t return nil or an empty string when a tool produces nothing. According to the GitHub releases, RubyLLM now sends a small (no output) placeholder internally when a tool returns no content – this prevents provider-invalid empty content errors on Anthropic, Bedrock, and Gemini. Return an empty hash; the gem handles the rest per provider.

The gotchas competitors won’t tell you about

Five things that will bite you in production. None of these are in the README.

1. xAI caching is silently broken

A Hacker News commenter who shipped it to production put it plainly: caches don’t always work with xAI, because it only supports the completions API and thought signatures come back wrong. If you picked Grok specifically for prompt caching cost savings, test before you commit.

2. Fibers need the right web server – default Puma won’t do it

:fibers mode requires a Fiber-compatible runtime context. Falcon works. The Async gem works. Plain Puma without the async adapter? Silent underperformance. The safe default: use :threads unless you’ve wired async I/O end-to-end and confirmed your server supports it. (Per dudarik.com, 2025.)

3. “Unified API” hides uneven feature support

The interface looks identical across providers. The capabilities aren’t. Extended thinking, image generation, and audio transcription are provider-dependent – check the model registry before assuming a feature works on your chosen provider. The docs note this, but the unified API surface makes it easy to miss.

4. Cost helpers return nil when pricing data is stale

Per the rubyllm.com models docs: if pricing data is incomplete for tokens that were actually consumed, cost and cost.total return nil. Your billing dashboard won’t crash – it’ll just quietly underreport. Wrap cost lookups with a nil guard or fire an alert when you see it.

5. Read-only gem directories break model refresh

Common in Docker and Heroku-style deploys. If the gem directory is read-only, RubyLLM.models.save_to_json silently fails. The fix (v1.9.0+): set config.model_registry_file to a writable path – documented at rubyllm.com/models.

When to pick something else

If your app is a heavy RAG pipeline with vector search baked in, RubyLLM won’t help – there’s no built-in vector store. You’d wire your own (pgvector, Pinecone, etc.). For that shape of project, langchainrb’s broader orchestration layer may be a better fit.

Single-provider OpenAI-only product? The official openai-ruby SDK will track new API features faster than any abstraction layer can.

Multi-provider, agents, file inputs, Rails persistence – that’s RubyLLM’s shape. As of early 2026, there isn’t a close second in the Ruby ecosystem for that specific combination.

Where to go next

Don’t read more tutorials. Pick one switch you’ve been delaying – Claude for long context, Gemini for vision, Ollama for local inference – and port one endpoint to RubyLLM. Migration is usually under an hour. After that you’ll have a real opinion on whether the abstraction earns its keep for your codebase.

Start with the Getting Started guide, skim the latest releases for anything new since this article, and if you’re on Rails, the model registry docs are where the production-relevant detail actually lives.

FAQ

Does RubyLLM work without Rails?

Yes. The core gem is plain Ruby – Rails integration is opt-in through acts_as_chat helpers. A standalone script with require 'ruby_llm' works fine.

How is RubyLLM different from langchainrb?

RubyLLM: a focused, opinionated client for talking to LLMs across providers, with agents and tools built in. LangChain.rb: broader orchestration with vector stores, document loaders, and retrievers included. If you’re building a RAG pipeline from scratch with chunking and retrieval, langchainrb has more pieces assembled for you. If you just want clean multi-provider chat with agents and don’t need the orchestration overhead, RubyLLM is the leaner option. Most teams find out which one they need after an hour of use – start with RubyLLM if you’re unsure, because stripping langchainrb down to just the chat layer is frustrating.

Can I use it with Azure OpenAI or a local LiteLLM proxy?

Set config.openai_api_base to your endpoint URL and use your Azure or proxy key as the openai_api_key. Same pattern for LM Studio, self-hosted gateways, or any OpenAI-compatible API – no custom adapter needed. One catch: Azure’s API versioning occasionally lags behind OpenAI’s, so test new model features against your specific Azure deployment, not just the main API.