AgentAZClaw

AgentAZClaw is a standalone multi-agent orchestrator built on AgentAZAll. It runs multiple LLM agents in rounds, using persistent memory instead of conversation history. Three classes. Six built-in tools. One install.

Unlike frameworks that stuff every message into context until it overflows, AgentAZClaw keeps only the last round in view. Everything else is stored via remember() and retrieved via recall() — on demand, when the agent needs it. This is why it can run for 9 hours without degradation.

Unlike frameworks that require 50 packages and complex YAML configs, AgentAZClaw is 960 lines of Python with one dependency. You define agents, give them a task, and call .run().

The 20-Line Quickstart

No YAML. No config files. No plugin system. Just Python.

# pip install azclaw
from azclaw import Agent, Orchestrator

endpoint = "http://localhost:8080/v1/chat/completions"

architect = Agent("architect",
    role="You design software architecture.",
    endpoint=endpoint)

developer = Agent("developer",
    role="You write Python code.",
    endpoint=endpoint, can_write=True)

reviewer = Agent("reviewer",
    role="You review code for bugs.",
    endpoint=endpoint)

orch = Orchestrator(agents=[architect, developer, reviewer])
orch.set_task("Build a FastAPI REST API for a todo app.")
orch.run(max_rounds=20)

The orchestrator handles memory-first context management, role-based tool access, deduplication, checkpointing, and graceful shutdown. You define agents and a task. AgentAZClaw does the rest.

No LLM Server? No Problem.

AgentAZClaw detects your hardware and provisions the right model automatically.

$ azclaw setup

  Scanning hardware...
  GPU: NVIDIA RTX 4090 (24GB VRAM)
  RAM: 32GB

  Recommended: Nemotron-3-Nano-30B-A3B (Q4, 16GB)
  30B total, 3B active params. 80+ tok/s on your hardware.

  [1] Accept recommendation (download 16GB)
  [2] Show all models
  [3] I already have a server running

  Downloading... done.
  Starting llama-server on port 8400... ready!

  Run: azclaw run --task "Build a REST API"

Multi-GPU 48GB+Nemotron-3-Nano-30B-A3B Q8 (32GB)

Single GPU 24GBNemotron-3-Nano-30B-A3B Q4 (16GB)

Single GPU 16GBIBM Granite 3.3 8B (5GB)

Single GPU 12GBQwen3-8B (5GB)

Single GPU 8GBQwen3-4B (2.5GB)

CPU 16GB+ RAMIBM Granite 3.3 2B (1.5GB)

CPU 8GB RAMQwen3-0.6B (600MB)

All models verified ungated on HuggingFace. No tokens required. No login. No manual download.

Built-In Features

Memory-First Context

Only the last round goes into the LLM context. Everything else lives in AgentAZAll's filesystem-based memory. Agents call recall() to retrieve past decisions and remember() to store new ones.

Context stays at 2–9K tokens regardless of how many rounds have passed. Speed at round 199 is the same as round 1.

# What each agent sees every turn:
System prompt (role + tools)     ~800 tok
Phase instruction                ~200 tok
Last round's messages          ~2-6K tok
                               ─────────
TOTAL: 3-9K tokens. Always.

Role-Based Tool Access

Every agent gets recall, remember, read_file, and list_files. Only agents with can_write=True get write_file and run_python.

This prevents the chaos of multiple agents overwriting each other's work. The Architect designs. The Developer implements. The Reviewer validates.

# Architect: read-only + memory
tools: recall, remember, read_file, list_files

# Developer: full write access
tools: recall, remember, read_file, list_files,
       write_file, run_python

# Reviewer: read-only + memory
tools: recall, remember, read_file, list_files

Dedup Detection

If an agent calls the same tool with the same arguments twice, the duplicate is automatically skipped. If all calls in a turn are duplicates, the agent is forced to analyze existing results instead of looping.

Per-agent tracking — different agents can make the same call. Resets every round.

# Round 45:
[architect] recall("") = 3,588 chars
[architect] recall("") = SKIPPED (dup)
[architect] remember("Schema uses...") = 44 chars

# Forces the agent to produce content
# instead of looping endlessly

Checkpointing & Resume

State is saved every 5 rounds. If a run is interrupted — power failure, network timeout, Ctrl+C — resume from exactly where it stopped.

Preserves round number, all agent stats, memory counts, and the last 100 history messages.

# Run gets interrupted at round 47
$ azclaw run --topic migration.json
  ...
  Round 47 [Data Layer]
  ^C  >> Signal received. Finishing round...
  DONE

# Resume from checkpoint
$ azclaw run --topic migration.json \
    --resume logs/migration_checkpoint.json
  Resuming from round 48...

Custom Tools

Add your own tools with a simple decorator. The registry auto-generates OpenAI function-calling schemas. Permission-gate tools with the requires parameter.

from azclaw import build_default_registry

registry = build_default_registry()

@registry.tool("search_code",
    "Search codebase with ripgrep",
    {"query": "string", "file_type": "string"})
def search_code(query, file_type="py", _ctx=None):
    import subprocess
    r = subprocess.run(
        ["rg", query, "--type", file_type],
        capture_output=True, text=True)
    return r.stdout[:5000]

orch = Orchestrator(agents=[...], registry=registry)

Feature	OpenClaw	AgentAZClaw
Memory	Context window only	Persistent filesystem memory
Context strategy	Stuff everything	Last round only + recall()
Speed after 100 rounds	Collapses	Constant
Dependencies	50+ packages	1 (agentazall)
Lines of code	~50,000	~960
Setup	Complex YAML configs	20 lines of Python
Auto-backend	No	Hardware detect + model download
Proven at 9 hours	Never tested	Yes — 199 rounds, 0 errors

Architecture

azclaw/
  __init__.py       (20 lines)   Public API
  agent.py          (100 lines)  Agent class: LLM + role + tools
  orchestrator.py   (350 lines)  Round loop, memory-first context
  tools.py          (260 lines)  ToolRegistry + 6 built-in tools
  llm.py            (100 lines)  urllib-only OpenAI client
  topic.py          (70 lines)   Phase/topic config loader
  cli.py            (100 lines)  CLI entry point
  backend.py        (250 lines)  Hardware detection + model provisioning

Three classes: Agent, Orchestrator, ToolRegistry. Zero external dependencies beyond agentazall. Works with any OpenAI-compatible endpoint: llama.cpp, vLLM, Ollama, LM Studio, OpenRouter.

The 20-Line Quickstart

No LLM Server? No Problem.

Built-In Features

Memory-First Context

Role-Based Tool Access

Dedup Detection

Checkpointing & Resume

Custom Tools

How It Compares

Architecture

Start Building

Install & Run

See the Proof