AgentAZClaw

Memory-First Multi-Agent Framework

Define agents. Give them tools. Point them at a task. They run for hours without forgetting a single decision. Context stays small. Speed stays constant. Knowledge grows forever.

pip install azclaw View Source
Three ravens representing Architect, Developer, and Reviewer agents
~960 lines of code the entire framework
1 dependency agentazall (auto-installed)
23 tests passing agent, tools, orchestrator
20 lines to start three agents, shared memory

AgentAZClaw is a standalone multi-agent orchestrator built on AgentAZAll. It runs multiple LLM agents in rounds, using persistent memory instead of conversation history. Three classes. Six built-in tools. One install.

Unlike frameworks that stuff every message into context until it overflows, AgentAZClaw keeps only the last round in view. Everything else is stored via remember() and retrieved via recall() — on demand, when the agent needs it. This is why it can run for 9 hours without degradation.

Unlike frameworks that require 50 packages and complex YAML configs, AgentAZClaw is 960 lines of Python with one dependency. You define agents, give them a task, and call .run().

The 20-Line Quickstart

No YAML. No config files. No plugin system. Just Python.

# pip install azclaw
from azclaw import Agent, Orchestrator

endpoint = "http://localhost:8080/v1/chat/completions"

architect = Agent("architect",
    role="You design software architecture.",
    endpoint=endpoint)

developer = Agent("developer",
    role="You write Python code.",
    endpoint=endpoint, can_write=True)

reviewer = Agent("reviewer",
    role="You review code for bugs.",
    endpoint=endpoint)

orch = Orchestrator(agents=[architect, developer, reviewer])
orch.set_task("Build a FastAPI REST API for a todo app.")
orch.run(max_rounds=20)

The orchestrator handles memory-first context management, role-based tool access, deduplication, checkpointing, and graceful shutdown. You define agents and a task. AgentAZClaw does the rest.

No LLM Server? No Problem.

AgentAZClaw detects your hardware and provisions the right model automatically.

$ azclaw setup

  Scanning hardware...
  GPU: NVIDIA RTX 4090 (24GB VRAM)
  RAM: 32GB

  Recommended: Nemotron-3-Nano-30B-A3B (Q4, 16GB)
  30B total, 3B active params. 80+ tok/s on your hardware.

  [1] Accept recommendation (download 16GB)
  [2] Show all models
  [3] I already have a server running

  Downloading... done.
  Starting llama-server on port 8400... ready!

  Run: azclaw run --task "Build a REST API"
Multi-GPU 48GB+Nemotron-3-Nano-30B-A3B Q8 (32GB)
Single GPU 24GBNemotron-3-Nano-30B-A3B Q4 (16GB)
Single GPU 16GBIBM Granite 3.3 8B (5GB)
Single GPU 12GBQwen3-8B (5GB)
Single GPU 8GBQwen3-4B (2.5GB)
CPU 16GB+ RAMIBM Granite 3.3 2B (1.5GB)
CPU 8GB RAMQwen3-0.6B (600MB)

All models verified ungated on HuggingFace. No tokens required. No login. No manual download.

Built-In Features

Memory-First Context

Only the last round goes into the LLM context. Everything else lives in AgentAZAll's filesystem-based memory. Agents call recall() to retrieve past decisions and remember() to store new ones.

Context stays at 2–9K tokens regardless of how many rounds have passed. Speed at round 199 is the same as round 1.

# What each agent sees every turn:
System prompt (role + tools)     ~800 tok
Phase instruction                ~200 tok
Last round's messages          ~2-6K tok
                               ─────────
TOTAL: 3-9K tokens. Always.

Role-Based Tool Access

Every agent gets recall, remember, read_file, and list_files. Only agents with can_write=True get write_file and run_python.

This prevents the chaos of multiple agents overwriting each other's work. The Architect designs. The Developer implements. The Reviewer validates.

# Architect: read-only + memory
tools: recall, remember, read_file, list_files

# Developer: full write access
tools: recall, remember, read_file, list_files,
       write_file, run_python

# Reviewer: read-only + memory
tools: recall, remember, read_file, list_files

Dedup Detection

If an agent calls the same tool with the same arguments twice, the duplicate is automatically skipped. If all calls in a turn are duplicates, the agent is forced to analyze existing results instead of looping.

Per-agent tracking — different agents can make the same call. Resets every round.

# Round 45:
[architect] recall("") = 3,588 chars
[architect] recall("") = SKIPPED (dup)
[architect] remember("Schema uses...") = 44 chars

# Forces the agent to produce content
# instead of looping endlessly

Checkpointing & Resume

State is saved every 5 rounds. If a run is interrupted — power failure, network timeout, Ctrl+C — resume from exactly where it stopped.

Preserves round number, all agent stats, memory counts, and the last 100 history messages.

# Run gets interrupted at round 47
$ azclaw run --topic migration.json
  ...
  Round 47 [Data Layer]
  ^C  >> Signal received. Finishing round...
  DONE

# Resume from checkpoint
$ azclaw run --topic migration.json \
    --resume logs/migration_checkpoint.json
  Resuming from round 48...

Custom Tools

Add your own tools with a simple decorator. The registry auto-generates OpenAI function-calling schemas. Permission-gate tools with the requires parameter.

from azclaw import build_default_registry

registry = build_default_registry()

@registry.tool("search_code",
    "Search codebase with ripgrep",
    {"query": "string", "file_type": "string"})
def search_code(query, file_type="py", _ctx=None):
    import subprocess
    r = subprocess.run(
        ["rg", query, "--type", file_type],
        capture_output=True, text=True)
    return r.stdout[:5000]

orch = Orchestrator(agents=[...], registry=registry)

How It Compares

Feature OpenClaw AgentAZClaw
MemoryContext window onlyPersistent filesystem memory
Context strategyStuff everythingLast round only + recall()
Speed after 100 roundsCollapsesConstant
Dependencies50+ packages1 (agentazall)
Lines of code~50,000~960
SetupComplex YAML configs20 lines of Python
Auto-backendNoHardware detect + model download
Proven at 9 hoursNever testedYes — 199 rounds, 0 errors

Architecture

azclaw/
  __init__.py       (20 lines)   Public API
  agent.py          (100 lines)  Agent class: LLM + role + tools
  orchestrator.py   (350 lines)  Round loop, memory-first context
  tools.py          (260 lines)  ToolRegistry + 6 built-in tools
  llm.py            (100 lines)  urllib-only OpenAI client
  topic.py          (70 lines)   Phase/topic config loader
  cli.py            (100 lines)  CLI entry point
  backend.py        (250 lines)  Hardware detection + model provisioning

Three classes: Agent, Orchestrator, ToolRegistry. Zero external dependencies beyond agentazall. Works with any OpenAI-compatible endpoint: llama.cpp, vLLM, Ollama, LM Studio, OpenRouter.

Start Building

Install & Run

pip install azclaw

# Auto-detect hardware, download model, start server
azclaw setup

# Run your first multi-agent task
azclaw run --task "Build a FastAPI todo app"

One command to set up. One command to run. All inference local.

See the Proof

# The 9-hour autonomous run
# 199 rounds, 52 files, 402 memories, 0 errors

Read the full experiment →

# Download the raw data
carddemo-agentazall-results.zip (743 KB)

Every number backed by real data. Verify it yourself.