Memory-First Multi-Agent Framework
Define agents. Give them tools. Point them at a task. They run for hours without forgetting a single decision. Context stays small. Speed stays constant. Knowledge grows forever.
pip install azclaw
View Source
AgentAZClaw is a standalone multi-agent orchestrator built on AgentAZAll. It runs multiple LLM agents in rounds, using persistent memory instead of conversation history. Three classes. Six built-in tools. One install.
Unlike frameworks that stuff every message into context until it overflows,
AgentAZClaw keeps only the last round in view. Everything else is stored via
remember() and retrieved via recall() — on demand,
when the agent needs it. This is why it can run for
9 hours without degradation.
Unlike frameworks that require 50 packages and complex YAML configs,
AgentAZClaw is 960 lines of Python with one dependency. You define agents,
give them a task, and call .run().
No YAML. No config files. No plugin system. Just Python.
# pip install azclaw
from azclaw import Agent, Orchestrator
endpoint = "http://localhost:8080/v1/chat/completions"
architect = Agent("architect",
role="You design software architecture.",
endpoint=endpoint)
developer = Agent("developer",
role="You write Python code.",
endpoint=endpoint, can_write=True)
reviewer = Agent("reviewer",
role="You review code for bugs.",
endpoint=endpoint)
orch = Orchestrator(agents=[architect, developer, reviewer])
orch.set_task("Build a FastAPI REST API for a todo app.")
orch.run(max_rounds=20)
The orchestrator handles memory-first context management, role-based tool access, deduplication, checkpointing, and graceful shutdown. You define agents and a task. AgentAZClaw does the rest.
AgentAZClaw detects your hardware and provisions the right model automatically.
$ azclaw setup Scanning hardware... GPU: NVIDIA RTX 4090 (24GB VRAM) RAM: 32GB Recommended: Nemotron-3-Nano-30B-A3B (Q4, 16GB) 30B total, 3B active params. 80+ tok/s on your hardware. [1] Accept recommendation (download 16GB) [2] Show all models [3] I already have a server running Downloading... done. Starting llama-server on port 8400... ready! Run: azclaw run --task "Build a REST API"
All models verified ungated on HuggingFace. No tokens required. No login. No manual download.
Only the last round goes into the LLM context. Everything else
lives in AgentAZAll's filesystem-based memory. Agents call
recall() to retrieve past decisions and
remember() to store new ones.
Context stays at 2–9K tokens regardless of how many rounds have passed. Speed at round 199 is the same as round 1.
# What each agent sees every turn:
System prompt (role + tools) ~800 tok
Phase instruction ~200 tok
Last round's messages ~2-6K tok
─────────
TOTAL: 3-9K tokens. Always.
Every agent gets recall, remember,
read_file, and list_files. Only agents
with can_write=True get write_file and
run_python.
This prevents the chaos of multiple agents overwriting each other's work. The Architect designs. The Developer implements. The Reviewer validates.
# Architect: read-only + memory tools: recall, remember, read_file, list_files # Developer: full write access tools: recall, remember, read_file, list_files, write_file, run_python # Reviewer: read-only + memory tools: recall, remember, read_file, list_files
If an agent calls the same tool with the same arguments twice, the duplicate is automatically skipped. If all calls in a turn are duplicates, the agent is forced to analyze existing results instead of looping.
Per-agent tracking — different agents can make the same call. Resets every round.
# Round 45: [architect] recall("") = 3,588 chars [architect] recall("") = SKIPPED (dup) [architect] remember("Schema uses...") = 44 chars # Forces the agent to produce content # instead of looping endlessly
State is saved every 5 rounds. If a run is interrupted — power failure, network timeout, Ctrl+C — resume from exactly where it stopped.
Preserves round number, all agent stats, memory counts, and the last 100 history messages.
# Run gets interrupted at round 47 $ azclaw run --topic migration.json ... Round 47 [Data Layer] ^C >> Signal received. Finishing round... DONE # Resume from checkpoint $ azclaw run --topic migration.json \ --resume logs/migration_checkpoint.json Resuming from round 48...
Add your own tools with a simple decorator. The registry
auto-generates OpenAI function-calling schemas. Permission-gate
tools with the requires parameter.
from azclaw import build_default_registry
registry = build_default_registry()
@registry.tool("search_code",
"Search codebase with ripgrep",
{"query": "string", "file_type": "string"})
def search_code(query, file_type="py", _ctx=None):
import subprocess
r = subprocess.run(
["rg", query, "--type", file_type],
capture_output=True, text=True)
return r.stdout[:5000]
orch = Orchestrator(agents=[...], registry=registry)
| Feature | OpenClaw | AgentAZClaw |
|---|---|---|
| Memory | Context window only | Persistent filesystem memory |
| Context strategy | Stuff everything | Last round only + recall() |
| Speed after 100 rounds | Collapses | Constant |
| Dependencies | 50+ packages | 1 (agentazall) |
| Lines of code | ~50,000 | ~960 |
| Setup | Complex YAML configs | 20 lines of Python |
| Auto-backend | No | Hardware detect + model download |
| Proven at 9 hours | Never tested | Yes — 199 rounds, 0 errors |
azclaw/ __init__.py (20 lines) Public API agent.py (100 lines) Agent class: LLM + role + tools orchestrator.py (350 lines) Round loop, memory-first context tools.py (260 lines) ToolRegistry + 6 built-in tools llm.py (100 lines) urllib-only OpenAI client topic.py (70 lines) Phase/topic config loader cli.py (100 lines) CLI entry point backend.py (250 lines) Hardware detection + model provisioning
Three classes: Agent, Orchestrator, ToolRegistry.
Zero external dependencies beyond agentazall.
Works with any OpenAI-compatible endpoint: llama.cpp, vLLM, Ollama, LM Studio, OpenRouter.
pip install azclaw # Auto-detect hardware, download model, start server azclaw setup # Run your first multi-agent task azclaw run --task "Build a FastAPI todo app"
One command to set up. One command to run. All inference local.
# The 9-hour autonomous run # 199 rounds, 52 files, 402 memories, 0 errors Read the full experiment → # Download the raw data carddemo-agentazall-results.zip (743 KB)
Every number backed by real data. Verify it yourself.