Skip to content

OpenClaw Lobster YAML: The Approval-First Workflow Nobody Taught You

Most Lobster tutorials skip the resume token gotcha. Here's the approval-first workflow design pattern that actually prevents 50-email disasters.

9 min readIntermediate

Every Lobster tutorial tells you workflows are “deterministic” and “safe.” What they don’t tell you: the safety comes from what Lobster can’t do, not what it can.

No loops. No branching. No arbitrary code execution inside the workflow itself. To an engineer used to GitHub Actions or n8n, this looks like a limitation. It’s actually the entire point.

Why Constraints Beat Features in Production Workflows

Here’s the uncomfortable truth about AI-driven automation: complex workflows today require many back-and-forth tool calls, and each call costs tokens while the LLM has to orchestrate every step. That’s expensive. Worse, it’s unpredictable.

Lobster flips this. OpenClaw runs Lobster workflows in-process using an embedded runner – no external CLI subprocess is spawned; the workflow engine executes inside the gateway process and returns a JSON envelope directly. One tool call. One structured result. The LLM plans once, then Lobster executes deterministically.

But the real genius is the approval primitive. Most automation tools treat approvals as a checkbox feature. Lobster treats them as a hard stop. The workflow halts completely until human approval is granted, ensuring that irreversible actions (like sending 50 emails or modifying production databases) are never executed blindly. Not a prompt hint. A literal pause.

The Approval-First Pattern (That Actually Prevents Disasters)

Think about the classic inbox triage workflow. You want to categorize emails, draft replies, and send them. Every tutorial shows you the same 4-step YAML:

name: inbox-triage
args:
 tag:
 default: "family"
steps:
 - id: collect
 command: inbox list --json
 - id: categorize
 command: inbox categorize --json
 stdin: $collect.stdout
 - id: approve
 command: inbox apply --approve
 stdin: $categorize.stdout
 approval: required
 - id: execute
 command: inbox apply --execute
 stdin: $categorize.stdout
 condition: $approve.approved

Here’s what the tutorials skip: step 3 doesn’t just ask for approval. It stops execution. If the pipeline pauses for approval, the tool returns a resumeToken so you can continue later. The entire runtime serializes its state, hands you a token, and waits. No re-running steps 1 and 2. No token waste.

This is the approval-first pattern: side effects come after the gate, never before.

Pro tip: Place approval gates immediately before any step with side effects (send email, post comment, delete file). If your workflow triggers external APIs, the approval step should preview the exact payload using approve --preview-from-stdin --limit N to show what will be sent.

Resume Tokens: The Mechanic Nobody Explains

When a workflow halts for approval, Lobster returns a JSON envelope with status: "needs_approval" and a resumeToken. To continue:

{
 "action": "resume",
 "token": "",
 "approve": true
}

Simple enough. But here’s the gotcha: resume tokens are now compact – Lobster stores workflow resume state under its state dir and hands back a small token key (as of January 2026). Older tutorials show full-state tokens. Those are outdated. The new tokens are opaque keys. You can’t inspect them. You can’t modify them. You just pass them back.

Why does this matter? If you’re building a multi-agent system where one agent starts a workflow and another approves it hours later, the state persistence happens server-side. The token is just an address. This wasn’t always true – the change landed in early 2026.

Timeout and Output Traps (The Silent Killers)

Here’s a workflow that looks fine but will fail in production:

steps:
 - id: fetch
 run: curl https://api.example.com/large-dataset --json
 - id: process
 pipeline: llm_task.invoke --prompt "Summarize this data"
 stdin: $fetch.stdout

Two problems. First: default timeout is 20000ms (20 seconds) and maxStdoutBytes is 512000 bytes. That API call might take 30 seconds. The JSON response might be 1MB. Both will hit limits and fail silently (well, not silently – you’ll get "lobster timed out" or "lobster output exceeded maxStdoutBytes" in the error envelope).

Fix it by raising limits explicitly:

{
 "action": "run",
 "pipeline": "path/to/workflow.lobster",
 "timeoutMs": 60000,
 "maxStdoutBytes": 2048000
}

Second problem: llm_task.invoke is not a shell executable. Use pipeline: for llm.invoke and llm_task.invoke (they are Lobster pipeline stages, not shell executables). Use run: only for real binaries in your shell. Mixing them up is the #1 error in community support threads. If you see bash: llm_task.invoke: command not found, you used run: instead of pipeline:.

Argument Injection: The ${arg} Footgun

Lobster supports argument substitution with ${arg}. Looks convenient:

args:
 query:
 default: "hello world"
steps:
 - id: search
 run: grep "${query}" /data/logs.txt

This breaks the moment query contains a quote, a $, a backtick, or a newline. For anything that may contain quotes, $, backticks, or newlines, prefer env vars: every resolved workflow arg is exposed as LOBSTER_ARG_<NAME> (uppercased, non-alnum → _). Safer version:

steps:
 - id: search
 run: grep "$LOBSTER_ARG_QUERY" /data/logs.txt

The env var approach is shell-safe. The ${arg} approach is raw string replacement. Choose wrong and you’ll spend an hour debugging why your workflow fails on user input.

The Loop Controversy (And Why It Matters)

Original Lobster design: no loops. The pipeline cannot branch, loop, or execute arbitrary code. This constraint is the source of Lobster’s determinism. But in February 2026, a developer needed a code → review → test loop for autonomous agents. Lobster didn’t have loops. Instead of building a wrapper script, he added loop support to Lobster itself. The sub-lobster PR is 129 lines of implementation + 186 lines of tests. It took less time than any workarounds would have.

The PR (#20) sparked debate. Maintainers worried that loops undermine Lobster’s core value: visibility. The primary upside of Lobster is visibility and approval gating. Having an invisible sub loop running – especially compared to a Ralph loop – concerns me, one maintainer wrote.

As of April 2026, the PR is merged but the tension remains: do you want a workflow engine that’s simple to audit or powerful enough to handle complex logic? Lobster chose the former. Loops exist, but they’re sub-workflows – still visible, still auditable.

Feature Lobster GitHub Actions n8n
Approval gates (hard stop) ✅ Built-in primitive ❌ Manual via environments ✅ Via webhook + wait node
Resume token ✅ Native ❌ Re-run from start ✅ Execution pauses
Loops ⚠ Sub-workflows only (as of Feb 2026) ✅ matrix, for-each ✅ Loop node
Branching ✅ condition: field ✅ if: expressions ✅ Switch/IF nodes
Timeout control ✅ Per-workflow (default 20s) ✅ Per-job (default 6h) ✅ Per-node
JSON piping ✅ stdin: $step.json ❌ Needs jq in script ✅ Native JSON handling
AI agent integration ✅ Designed for OpenClaw ❌ External trigger only ✅ Via API calls

Real Production Workflow: GitHub PR Monitor

Here’s a workflow that actually runs in production (from the Lobster repo itself):

name: github.pr.monitor
args:
 repo:
 pr:
steps:
 - id: fetch
 run: gh pr view ${pr} --repo ${repo} --json author,state,title,updatedAt
 - id: check-changes
 pipeline: >
 llm_task.invoke
 --prompt "Compare this PR state with the last known state and list what changed"
 --input $fetch.json
 stdin: $last_state.json
 - id: notify
 run: openclaw.invoke --tool message --action send --args-json '{"text":"PR updated"}'
 condition: $check-changes.changed

Notice: gh pr view is a real CLI tool (GitHub CLI), so it uses run:. llm_task.invoke is a Lobster pipeline stage, so it uses pipeline:. The notify step only fires if condition: $check-changes.changed is true. This is deterministic branching – not an LLM deciding whether to send the message.

The workflow runs every 30 minutes via cron. State persists between runs. No tokens wasted on “should I check this PR again?” logic. The LLM only handles the diff comparison.

When Lobster Is the Wrong Choice

Lobster shines when you need deterministic, auditable, resumable workflows with human checkpoints. It’s terrible when you need:

  • Complex control flow. If your workflow logic requires nested if/else or dynamic branching based on runtime data, GitHub Actions or n8n will be less painful.
  • Stateful iteration. Lobster’s loop support (as of Feb 2026) is limited to sub-workflows. If you need to iterate over 1,000 items with shared state, you’ll fight the design.
  • Third-party service orchestration. n8n has 400+ native integrations. Lobster has… whatever CLI tools you install. If you’re chaining Airtable → Slack → Notion, n8n is the obvious choice.

But if you’re building AI agent workflows where the agent plans, Lobster executes, and humans approve before anything ships to production? Nothing else comes close.

The Pattern That Actually Works

After reading the docs, the GitHub issues, and the community implementations, here’s the pattern that shows up in production systems:

  1. Agent plans the workflow. LLM decides what needs to happen (“fetch emails, categorize, draft replies”).
  2. Lobster executes deterministically. Each step is a shell command or pipeline stage. No LLM in the loop.
  3. Approval gate before side effects. Human reviews the plan before anything ships.
  4. Resume if approved. Workflow continues from the exact step where it paused. No re-execution.

This is the opposite of “let the AI do everything.” It’s “let the AI plan, let Lobster execute, let humans approve.” The constraint – no loops, no arbitrary code in YAML – is what makes it safe enough to trust.

Can Lobster workflows call other Lobster workflows?

Yes. Use the lobster: field instead of run: or command:. Path to a .lobster file to run as a sub-workflow (resolved relative to the parent workflow). Mutually exclusive with command. As of February 2026, you can also loop sub-workflows with loop: { maxIterations: N, condition: "shell command" }.

What happens if I forget to set timeoutMs and my workflow hangs?

The embedded runner enforces a 20-second default. lobster timed out → increase timeoutMs, or split a long pipeline. The workflow fails, returns an error envelope, and doesn’t consume more resources. You won’t get a stuck process – you’ll get an error message telling you exactly what limit you hit.

How do I debug “lobster returned invalid JSON” errors?

lobster returned invalid JSON → ensure the pipeline runs in tool mode and prints only JSON. Common cause: your shell command printed logs to stdout mixed with the JSON output. Either redirect logs to stderr (echo "log" >&2) or use --json flags on CLI tools to guarantee clean output. Run the command manually in a terminal first to verify it outputs valid JSON.

Now build a workflow. Pick something real – inbox triage, PR monitoring, daily report generation. Write the YAML. Add an approval gate before the side effect. Set explicit timeout and output limits. Test the resume token flow. Then tell your AI agent to invoke it. That’s when Lobster clicks: it’s not a replacement for scripting. It’s the layer between “AI decides what to do” and “production systems execute it safely.”