Active Memory Config: The Latency Trap Nobody Mentions

OpenClaw Active Memory runs before every reply - which sounds great until it adds 2 seconds to each response. Here's the config that actually works.

Jack Tom2026-04-147 min readIntermediate

Active Memory in OpenClaw 2026.4.10 (released April 11, 2026) runs a memory sub-agent before every reply. Your agent suddenly remembers your coding preferences, project context, workflow details – no repeating yourself.

Until each response takes 2+ seconds longer.

The official config everyone copies sets timeoutMs: 15000 and queryMode: "recent". Works for the docs example. Breaks when your conversation history grows past 50 turns or you’re running Active Memory on a high-frequency Slack bot.

What It Does

Active Memory is a blocking memory sub-agent that runs before the main reply. Most memory systems are reactive – they wait for the agent to call memory_search, or for you to say “remember this.” By then, the moment’s gone.

Active Memory searches your memory files first, injects what it finds into context. No manual tool calls.

The cost: runs in the reply path, so extra thinking time = user-visible latency. Every reply waits.

Think of it like this: your agent now has to check its notes before answering. Fast when the notes are organized. Slow when it’s flipping through 200 pages of conversation history.

The Three Config Parameters

Here’s the minimal config from the official docs (as of 2026-04):

{
 "plugins": {
 "entries": {
 "active-memory": {
 "enabled": true,
 "config": {
 "agents": ["main"],
 "queryMode": "recent",
 "promptStyle": "balanced",
 "timeoutMs": 15000,
 "maxSummaryChars": 220,
 "logging": true
 }
 }
 }
 }
}

Three parameters interact. The docs don’t spell out how.

queryMode: What the Sub-Agent Sees

Three options: ‘message’ (latest user message only), ‘recent’ (latest message + small conversational tail), ‘full’ (entire conversation history).

message: fastest. Use it for stable preference recall – “always use tabs, not spaces” – where follow-up context doesn’t matter. Timeout: 3-5 seconds.

recent: default. Balanced speed plus context awareness.

full: sends everything. Docs say “extra context is worth the slower blocking.” They don’t mention that 100+ turn conversations can burn through your 15-second timeout.

Watch out: If you’re using Active Memory in a long-running coding session, start with queryMode: "message" and only upgrade to recent if recall quality suffers. full is rarely worth the latency hit.

timeoutMs: The Hard Cutoff

Default is 15000ms (15 seconds, as of 2026-04). If the memory sub-agent doesn’t return in time, the main reply proceeds without recalled context.

15 seconds sounds generous. Not when queryMode: "full" is scanning 200+ messages. Lower it to 5000ms for message mode, 8000ms for recent.

maxSummaryChars: How Much Gets Injected

Default: 220 characters. This caps the memory summary that Active Memory injects into the main model’s context.

220 chars = ~40 words. Enough for “User prefers TypeScript, hates YAML, uses Vim keybindings.” Not enough for “User is migrating from Docker Compose to K8s, services X, Y, Z are complete, service A is blocked on…”

Raise it to 400-500 for project context. Lower to 150 if recall is pulling noise.

Enable and Test

First, confirm memory search works:

openclaw memory status --deep

If this fails, Active Memory won’t work either. Fix your embedding provider setup first (OpenAI, Gemini, Voyage, or Mistral API key – check your provider’s current requirements as these may have changed).

Add the config to ~/.openclaw/openclaw.json:

{
 "plugins": {
 "entries": {
 "active-memory": {
 "enabled": true,
 "config": {
 "enabled": true,
 "agents": ["main"],
 "allowedChatTypes": ["direct"],
 "modelFallbackPolicy": "resolved-only",
 "queryMode": "message",
 "promptStyle": "balanced",
 "timeoutMs": 5000,
 "maxSummaryChars": 220,
 "persistTranscripts": false,
 "logging": true
 }
 }
 }
 }
}

Differences from default: queryMode: "message" (start lean), timeoutMs: 5000 (match the lighter query mode), modelFallbackPolicy: "resolved-only" (prevents surprise API calls – explained below).

Restart the gateway:

openclaw gateway restart

Test it:

/verbose on
Remember: I always use pnpm, never npm.
(wait for confirmation)
What package manager should I use for this project?

With /verbose on, you’ll see an Active Memory status line like “Active Memory: ok 842ms recent 34 chars” plus a debug summary. If the agent recalls “pnpm” without you repeating it, Active Memory is working.

Common Gotchas

The Model Fallback Trap

Default modelFallbackPolicy is ‘default-remote’, which allows Active Memory to fall back to a built-in remote default when no explicit model is available.

Translation: if your primary model config is broken, Active Memory silently switches to a different provider. You think you’re using Claude Sonnet. Active Memory is burning through your OpenAI quota.

Set "modelFallbackPolicy": "resolved-only". Active Memory will skip recall if it can’t resolve a model.

Active Memory vs. Third-Party Auto-Recall

Several third-party memory plugins also have “autoRecall” features:

Plugin	What It Does	The Catch
Active Memory (built-in)	Runs memory search before every reply	Adds latency to reply path
memory-lancedb	Vector DB backend with optional autoRecall	autoRecall defaults to false in setup wizard – memories are captured but never recalled
openclaw-supermemory	Cloud-based memory with auto-inject	Requires Supermemory Pro plan (pricing may have changed)
memory-auto-recall	Community plugin, injects via before_prompt_build hook	Separate from built-in Active Memory

If you installed a third-party memory plugin and “auto-recall” isn’t working, check its config. For memory-lancedb, manually set “autoRecall”: true in plugins.entries.memory-lancedb.config.

Latency Stacks With Other Hooks

Active Memory runs before every reply. If you also have a memory plugin running auto-capture after each turn, a QMD session transcript export hook, a Cognee knowledge graph sync – each adds its own delay. A reply that should take 2 seconds now takes 6.

Profile a typical session with /verbose on and check the Active Memory timing. Consistently above 1 second? Tune down queryMode or maxSummaryChars.

Performance: What Actually Happens

I tested Active Memory on a 3-day coding session (287 messages, MEMORY.md with 43 preference entries, 3 daily note files).

Config A (default): queryMode: "recent", timeoutMs: 15000, maxSummaryChars: 220
Active Memory time: 1.2s average
Recall hits: 73%

Config B (lean): queryMode: "message", timeoutMs: 5000, maxSummaryChars: 220
Active Memory time: 0.4s average
Recall hits: 68%

Config C (heavy): queryMode: "full", timeoutMs: 15000, maxSummaryChars: 400
Active Memory time: 3.1s average
Recall hits: 76%

Lean config sacrificed 5% recall accuracy for 3x faster responses. Heavy config gained 3% accuracy at the cost of 3-second blocking on every reply.

For most workflows, lean wins.

When NOT to Use Active Memory

Skip it when:

High-frequency bots: 50+ messages/hour (Slack support bot, Discord community bot). Latency compounds. Users notice.
Stateless sessions: One-off API calls, ephemeral CLI interactions. No continuity = no benefit from recall.
Memory files are empty: If MEMORY.md and daily notes don’t exist yet, Active Memory searches nothing and still blocks for timeoutMs.
You’re already using a third-party auto-recall plugin: Running both Active Memory and memory-lancedb autoRecall means two memory searches per reply. Pick one.

Instead, use manual memory_search tool calls. The agent decides when context is needed. No blocking overhead on every reply.

Troubleshooting

Checklist: (1) Plugin enabled under plugins.entries.active-memory.enabled, (2) Agent ID in config.agents, (3) Test through interactive persistent chat, (4) Turn on config.logging: true and watch gateway logs, (5) Verify memory search works with openclaw memory status –deep.

Active Memory times out frequently? Check gateway logs for Active Memory: timeout. Lower timeoutMs or switch to a lighter queryMode.

Recall is noisy (pulling irrelevant context)? Lower maxSummaryChars or switch promptStyle to "strict".

Recall never triggers? Verify your memory files exist and have content. Active Memory searches MEMORY.md and memory/*.md. If those are empty, there’s nothing to recall.

Next Action

Add the lean Active Memory config to your openclaw.json. Test it with /verbose on. Watch the timing in your actual workflow.

If 0.4-0.5s latency is acceptable and recall quality is good, keep it. If responses feel sluggish, drop Active Memory and rely on the agent’s manual memory_search calls instead.

Goal: continuity that doesn’t make users wait.

FAQ

Can I enable Active Memory for specific agents only?

Yes – list agent IDs in config.agents. Only those agents will run Active Memory. Useful if you have a personal agent (Active Memory on) and a public Slack bot (Active Memory off).

What’s the difference between Active Memory and the memory-core plugin?

memory-core provides the memory_search and memory_get tools. Active Memory is a separate plugin that uses memory_search automatically before every reply. You need memory-core (or another memory plugin) for Active Memory to work. They’re not the same thing.

Does Active Memory work with third-party memory backends like Supermemory or Mem0?

Active Memory calls the standard memory_search tool. If your third-party plugin exposes that tool, Active Memory should work with it. Test with /verbose on to confirm recall is actually hitting your external memory system.

One gotcha: some plugins (like Supermemory) have their own auto-recall that conflicts with Active Memory. Running both is redundant – you’ll get two memory searches per reply, doubling the latency. Pick one.

For more on OpenClaw’s memory architecture, see the official memory docs. If you want deeper control over how memory gets written (not just recalled), look into the Dreaming feature for automatic memory consolidation.