AgentRL Overview
AgentRL — a self-evolving orchestration layer for Cersei. Agents that trace their own failures, propose fixes in sandboxes, and register the winners as reusable, programmable tools.
AgentRL
AgentRL turns Cersei from a single-shot coding agent into a self-evolving one. Give it a task. If the agent solves it, you're done. If it fails, AgentRL traces why, asks a planner for directed fixes, runs those fixes as throwaway sub-agents in isolated sandboxes, promotes the one that actually passes verification, and registers it as a reusable tool. The next time a similar problem shows up, the agent finds that tool and skips straight to the solution.
Flagship feature. AgentRL is the orchestration layer that ties together the rest of Cersei — the agent runtime, the sandbox system (cersei-vms), memory, embeddings, and a small programmable language (AgentTemplate). It ships as two crates: cersei-agentrl (the loop) and cersei-agentlang (the DSL).
The idea in one loop
┌─────────────────────────────────────────────────────────┐
task ───▶ │ registry.search(task) ──▶ GeneralAgent (+ cached tools)│
└─────────────────────────────────────────────────────────┘
│
success? ─────┴───── no ───▶ ExecutionGraph.failure_trace()
│ │ (directionality)
yes ▼
│ PlannerAgent → N proposals
▼ │
✅ done fan out into isolated sandboxes
│
verify each ─── winner ──▶ promote
│
register winner as a reusable Tool
+ record the problem in memory
│
✅ done (ByNewTool)Every box is a real, swappable component:
- ExecutionGraph — a DAG of turns and tool calls, built passively from the agent's event stream by a
Reporter. On failure it distills aFailureTracethat tells the planner exactly what to fix. - GeneralAgent / PlannerAgent — not new types; ordinary
Agents configured with different prompts and toolsets. - Sandboxes — proposals run in isolated working directories (or
cersei-vmssandboxes) so parallel attempts never trample each other. - ToolRegistry — a local, persisted, searchable database of agent-built tools. Lookup-before-build; register-on-win.
- Verifier — an independent check (
cargo test, a script, anything) that decides "did this actually work?" The agent can't game it.
Why it matters
A normal agent re-derives the same fix every time it hits a recurring problem class. AgentRL remembers solutions as executable tools, so it gets cheaper and more capable over time:
Self-improving
Solved problems become DynamicTools in a registry. Similar future tasks are solved by recall, not re-derivation — no planner, no sandboxes, no extra LLM spend.
Safe by construction
Recovery attempts run in isolated sandboxes. Untrusted, parallel, and disposable — only the verified winner is promoted to your real working directory.
Directed recovery
Failures aren't retried blindly. The ExecutionGraph extracts a scrubbed, ordered failure trace that gives each proposal concrete directionality.
Programmable
Sub-agents can be authored in AgentTemplate — a tiny functional language (io.read().write(), agent.send(), agent.tools.register()) that LLMs emit and the runtime executes safely.
Programmable end to end
AgentRL is built to be driven, not just configured. Two seams make it fully programmable:
- The
AgentRlRunnertrait is the mechanism — how a general agent runs, how proposals are generated, how each runs in a sandbox. Use the batteries-includedCerseiRunner(realAgent+Provider), or implement the trait yourself. - The AgentTemplate language lets the model write its own tools. Cersei can't easily rewrite its own runtime — but an LLM can write a short template program that the runtime executes on top of the existing tools, then register it for reuse.
60-second example
use cersei::prelude::*;
use cersei::agentrl::{CerseiRunner, CommandVerifier, Orchestrator, ToolRegistry, Verifier};
use cersei::provider::Gemini;
use std::sync::Arc;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
// A persisted, searchable database of agent-built tools.
let registry = ToolRegistry::open(dirs::home_dir().unwrap().join(".cersei/agentrl"))?;
// An INDEPENDENT verifier — the agent cannot cheat this check.
let verifier: Arc<dyn Verifier> =
Arc::new(CommandVerifier::new("python3 gcd.py 48 36 | grep -qx 12"));
// A fresh provider per agent (sub-agents get their own client).
let key = std::env::var("GEMINI_API_KEY")?;
let provider_factory = Arc::new(move || Box::new(Gemini::new(key.clone())) as Box<dyn Provider>);
// The production runner: real agents + provider + sandboxed proposals.
let runner = Arc::new(
CerseiRunner::new(provider_factory, "./work", registry.clone(), verifier)
.with_model("gemini-3.1-pro-preview") // required for non-Anthropic providers
.with_max_turns(16),
);
let orchestrator = Orchestrator::new(runner, registry);
let outcome = orchestrator
.solve("Create gcd.py: takes two ints, prints their GCD via the Euclidean algorithm.")
.await?;
println!("solved={} how={:?}", outcome.solved, outcome.how);
Ok(())
}Model selection. cersei-agent defaults to an Anthropic model when none is set. For Gemini, OpenAI, or any other provider you must call .with_model(...) on the runner (or .model(...) on the builder), or requests will fail.
What runs when (observed)
From the live end-to-end tests against Gemini:
| Scenario | Path | What happened |
|---|---|---|
| Capable agent, solvable task | Solved::Directly | GeneralAgent wrote gcd.py in one run (recovering from 2 tool errors mid-run). The verifier passed. No planner, no sandboxes. |
| First attempt fails | Solved::ByNewTool(id) | GeneralAgent failed the verifier → trace → planner proposed 2 fixes → both ran in isolated sandboxes → the winner was promoted and registered as a reusable tool. |
| Similar task, second time | Solved::ByCachedTool(id) | registry.search surfaced the previously-built tool; the agent solved it via recall. The planner never ran. |
Install
AgentRL is behind a feature flag (it pulls in the embeddings + sandbox crates):
[dependencies]
cersei = { version = "0.1", features = ["agentrl"] }
tokio = { version = "1", features = ["full"] }
anyhow = "1"This enables cersei::agentrl (the loop) and cersei::agentlang (the DSL), and turns on the vms sandbox feature.
Explore
API Reference
Orchestrator, AgentRlRunner, CerseiRunner, ExecutionGraph, ToolRegistry, Verifier, DynamicTool.
AgentTemplate Language
The programmable DSL: grammar, builtins, chaining, permissions, and the RunAgentTemplate tool.
Cookbook
Runnable recipes: a self-improving coding agent, forcing the recovery loop, custom runners and verifiers.
Sandboxes & VMs
The isolation layer AgentRL runs proposals in. Runtimes, primitives, snapshots.