Skip to content

Best AI Tools for Rust: A Working Workflow That Compiles

The best AI tools for Rust programming aren't the ones with the flashiest demos - they're the ones that respect rust-analyzer. Here's the workflow.

8 min readAdvanced

You ask an AI to write a Rust function that streams JSON over a Tokio channel. It produces 40 lines that look gorgeous. You run cargo build. Eleven errors. Three of them about lifetimes you didn’t ask for. One reference to tokio::sync::mpsc with an API that hasn’t existed since 2023. This is the typical first encounter with the best AI tools for Rust programming – they’re confident, they’re fast, and they’re wrong in specific, Rust-shaped ways.

The end state we’re walking toward looks different. Same prompt, but the code compiles on the first try, uses actual current sqlx 0.8 syntax, and the agent fixes its own borrow-checker mistakes before you even see them. That’s achievable today – but it takes a specific stack and a specific loop, not a single “best” tool.

The problem nobody benchmarks: Rust ages AI badly

Most language toolchains are forgiving. Python’s requests library has looked the same for a decade. Rust isn’t like that. Async runtimes mutate quarterly. sqlx went from 0.6 to 0.8 with macro changes. axum rewrote its extractor model in 0.7. An LLM trained 18 months ago will produce code that looked correct in its training data and is now subtly broken.

This is the real Rust problem – not the one tutorial sites advertise. Per the 2025 Rust Foundation Annual Report, Rust adoption grew from 1.05% to 1.47% in systems programming while 78% of Rust developers actively use AI coding assistants. The mismatch between rapid ecosystem churn and slow training cycles is why most AI Rust output needs a second pass.

The second problem: rust-analyzer already knows everything the AI is guessing at. Borrow checker errors, trait bounds, unresolved imports – surfaced in your editor as inlay hints and diagnostics. Most AI tools never read them.

Why the standard “top 10 Rust AI tools” lists fail you

You’ve seen the lists. Copilot at $10/month. Tabnine supporting 80+ languages. CodeWhisperer’s documentation focus. They’re not wrong, they’re just irrelevant to the question of which one survives contact with a real Rust codebase.

Breadth is the structural problem with generic tools. Tabnine does support 80+ languages and plugs into VS Code, JetBrains, Visual Studio, and Eclipse – but that’s the issue. Training data dominated by JavaScript and Python turns Rust idioms (owned vs borrowed, ? error propagation, lifetime elision) into statistical noise. The model hasn’t seen enough Rust to have strong opinions about it.

Three traits separate tools that move results on Rust from the ones that look impressive in demos: reading rust-analyzer’s output, iterating against cargo check in a loop, and exposing the model choice so you can swap when one stalls. Most “top 10” lists don’t even ask those questions.

The stack that actually works

Here’s the setup I keep coming back to after several months of testing. It’s three layers, and the layering matters more than any individual piece.

Layer 1 – Editor with rust-analyzer always on. VS Code works. Zed is the interesting one – it’s written from scratch in Rust, so its LLM panel can see LSP errors directly rather than treating the editor and the AI as separate contexts. RustRover with JetBrains’ Junie agent is heavier to set up but handles large workspaces better than the others in my experience. Pick based on workspace size; don’t overthink it.

Layer 2 – An agentic terminal tool that runs cargo check. This is the non-obvious piece – and honestly the one that matters most. Claude Code (terminal-based, documented at Anthropic’s official docs) will plan, execute, and re-run the compiler without prompting. Aider does the same thing as open-source, built around Git commits, with a broader model menu. Either one lets the model be wrong, see the compiler’s actual response, and self-correct. That loop is why “generates Rust” becomes “generates Rust that compiles.”

Layer 3 – A reasoning model with large context. At time of writing, Gemini’s Pro tier (specifically Gemini 2.5 Pro – though Google’s naming will likely shift again) offers a 1-million-token context window built for agentic work across files. Claude Sonnet (current generation as of early 2026) is the community default for borrow checker debugging. The point isn’t picking one permanently; it’s having both wired up so you can switch when one stalls on a problem.

Pro tip: Add a CLAUDE.md or AGENTS.md at your project root that says “Run cargo check --message-format=json after every edit. Do not respond until it returns zero errors.” This single instruction changes output quality more than any model upgrade.

A real example: porting a sync function to async

Here’s a concrete moment from last week. I had a synchronous file parser that needed to become a streaming async one. Pure Copilot autocomplete gave up after the first impl Stream. Here’s what the agentic loop did instead.

// Starting point - synchronous, blocking
fn parse_logs(path: &Path) -> Result<Vec<LogEntry>, ParseError> {
 let file = File::open(path)?;
 BufReader::new(file)
 .lines()
 .filter_map(Result::ok)
 .map(parse_line)
 .collect()
}

I told Claude Code: “Convert this to async, return a Stream of LogEntry, backpressure-aware, use tokio.” First draft used tokio_stream::wrappers::LinesStream. cargo check failed – the wrapper API had changed and the closure captured a non-Send reference. Without prompting, the agent re-read the error, queried the current tokio-stream docs through its tools, switched to tokio::io::AsyncBufReadExt with a manual async_stream::stream! macro, and re-ran the check. Clean compile in three iterations. Total time: maybe 90 seconds.

What made this work wasn’t the model. It was the loop – the model was allowed to be wrong, see the compiler’s response, and fix itself. Exactly the workflow a human Rust developer uses.

The pricing trap nobody mentions

The catch is that same loop has a cost. Claude Code and other agentic tools bill per token at API rates, and a borrow checker fight can get expensive – I’ve watched a single stubborn lifetime error eat $2-5 in API credits before I killed the session. Set a hard budget cap in your config before you start. If you’re using Aider, the --cache-prompts flag matters more for Rust than for any other language because the codebase context (lots of trait definitions) gets re-sent constantly.

For predictable costs, GitHub Copilot’s flat rate – $10/month for individuals, $19/month for businesses as of mid-2025 (check github.com/features/copilot for current pricing) – trades the full agent loop for budget certainty. Worth it for solo hobby work; not the right call for serious refactoring.

One unconventional option: a Rust-specific model

RustCoder is domain-tuned on Rust specifically – built on the Qwen-2.5 Coder base model and used in a January 2025 coding camp run by the OpenAtom Foundation and Tsinghua University, where over 1,000 students worked through it (per the CNCF announcement). It bundles Rust by Example, The Rust Programming Language book, and algorithm resources, and runs inside your IDE.

It won’t beat a frontier model on raw reasoning. But for learning – or for an air-gapped environment where shipping code to Anthropic or Google isn’t acceptable – a self-hosted, domain-tuned model is a genuine option that no “top 10” list will surface.

FAQ

Is GitHub Copilot good enough for Rust on its own?

For autocomplete inside a function body, yes. For multi-file refactors or async conversions, no – it doesn’t run the compiler.

Why does rust-analyzer matter so much for AI workflows?

Because it’s the only part of your toolchain that knows the truth about your code right now – inferred types, traits in scope, exactly where the borrow checker is unhappy. Picture this: an AI suggests let x = &data; spawn(move || use_x(x)). rust-analyzer immediately surfaces “cannot move out of borrowed content.” An AI agent that reads that diagnostic fixes it by cloning or restructuring. An AI that doesn’t will confidently produce three more variants of the same broken pattern. The diagnostic stream is the difference between a tool that iterates toward a solution and one that iterates in circles.

Which model is currently best for Rust specifically?

As of early 2026, Claude Sonnet (current generation) is the default choice I’ve seen most in Rust community threads for borrow checker reasoning; Gemini Pro works better when you need large-context project understanding across many files. Honest caveat: this will change. Benchmark your own use case every few months rather than trusting any single recommendation – including this one.

Next step: Install Aider or Claude Code today, add the cargo check instruction to your project’s agent config, and re-run a refactor task you previously gave up on. You’ll know within one session whether this workflow fits your brain.