Autonomous
Multi-Agent
Orchestration

Three AI agents. Shared persistent memory. Nine hours of autonomous operation. Context never exceeded 9K tokens. Speed never degraded. Zero errors.

pip install agentazall Download Results (743 KB)
Three ravens — The Architect, The Developer, The Reviewer — connected by recall, send, and remember
199 rounds 8 hours 46 minutes continuous
402 memories stored shared across 3 agents
52 Python files written 2,543 lines, real migration code
$0 cloud API cost all inference on local GPUs

Every multi-agent framework uses the context window as long-term memory. That works for a 10-minute demo. It does not work for a 9-hour autonomous run.

We watched a sliding-window orchestrator collapse: by round 25, the Architect's speed dropped from 96 tokens/second to 2. By round 30, agents were contradicting their own decisions from 20 minutes earlier — because those decisions had been evicted from context.

The fix was simple: stop putting conversation history in the context window. Only the last round goes in. Everything else is a tool call to recall(). The context stays at 3–9K tokens forever. Speed stays constant. Knowledge grows unbounded.

This is what AgentAZAll was built for.

Memory-First Architecture

Three design decisions that make 9-hour runs possible

1. Context Stays Small

Each agent sees only three things: its system prompt (~800 tokens), the current phase instruction (~200 tokens), and what every agent said in the previous round (~2–6K tokens).

No conversation history. No sliding window. No memory injection. The context is 3–9K tokens on every turn — whether it's round 1 or round 199.

Context-first collapsed at 2 tok/s by round 25. Memory-first sustained 100+ tok/s through round 199.

# What each agent sees every turn:
[System prompt — role, tools]     ~800 tok
[Phase instruction]               ~200 tok
[Last round's messages]         ~2-6K tok
                              ────────────
                        Total:  3-9K tok  ← always

2. Memory Grows Forever

Agents call recall() and remember() as tools — the LLM decides when to store and retrieve, not the orchestrator.

Memories are plain text files on the filesystem. No database. No vector store. No embeddings. You can cat, grep, and git commit them.

402 memories stored over 199 rounds. 77 by the Architect, 93 by the Developer, 154 by the Reviewer. The distribution emerged naturally from their roles.

agents/
  architect/remember/
    db-choice.txt
    phase2_decisions.txt
    auth-decisions.txt
    ...                    77 files
  developer/remember/
    cosgn00c-overview.txt
    customer-model.txt
    ...                    93 files
  reviewer/remember/
    field-mappings.txt
    cics-transaction.txt
    ...                   154 files

3. Roles Enforce Discipline

Only the Developer can write files. The Architect designs. The Reviewer validates. Tool access is role-gated — calling write_file as the Architect returns an error, not a result.

This prevents the chaos of multiple agents overwriting each other's work. The Developer made 153 write_file calls to produce 52 unique files — that's iteration, exactly what a human developer does.

# Role-based tool access

Architect:
  recall, remember, read_file, list_files

Developer:
  recall, remember, read_file, list_files,
  write_file, run_python  ← exclusive

Reviewer:
  recall, remember, read_file, list_files

The Experiment

COBOL-to-Python migration of AWS CardDemo

Three identical NVIDIA Nemotron-3-Nano-30B-A3B models (30B total, 3B active per step). Same weights, different role prompts. Running on an AMD EPYC server with 8 GPUs (242 GB VRAM).

Source material: AWS CardDemo — a real open-source COBOL/CICS credit card system. 29 programs, 29 copybooks, ~50,000 lines. EXEC CICS READ, PIC 9(09) COMP-3, pseudo-conversational patterns. The real thing.

ModelNemotron-3-Nano-30B-A3B Q8 × 3
Active params3B per inference (MoE)
GPUs used6 of 8 (~120 GB VRAM)
Inferencellama.cpp, CUDA 12.6, flash attention
Context config65,536 tokens per instance
Context used2–9K (memory-first)
Phases7 migration phases + open discussion
Completion tok3,510,906
Prompt tok20,788,680
Tool calls1,599
Errors0

Speed: Start vs. End

The whole point of memory-first — speed doesn't degrade

Metric Context-First (v2) Memory-First (v3)
Context at round 10 ~15K tokens ~4K tokens
Context at round 50 ~60K+ (overflow) ~5K tokens
Context at round 199 impossible ~6K tokens
Speed at round 10 95 tok/s 98 tok/s
Speed at round 50 2 tok/s 104 tok/s
Speed at round 199 101 tok/s

What They Built

The agents produced a real project structure: SQLAlchemy models from COBOL copybooks, FastAPI routes from CICS programs, Alembic migrations, Pydantic schemas, batch processors, and their own test suite.

The customer.py model includes a from_cobol_record() classmethod that maps every COBOL field to its Python equivalent — preserving leading zeros on PIC 9(09), parsing YYYYMMDD to date, and converting Y/N indicators to bool.

The acpt_persistence.py (9.8 KB) mirrors CICS READ/WRITE/REWRITE semantics, including the ws_change_has_occurred flag that gates updates — a direct translation of COBOL working-storage patterns.

Without being told to, the agents also produced reconcile_outputs.py (12 KB) — a script that compares COBOL batch output against Python batch output field-by-field. This is exactly what a real migration cutover requires.

output/
  main.py
  scheduler.py
  batch_record_builder.py
  models/
    customer.py, account.py
    user_security.py, transaction_detail.py
    billing_statement.py, audit_log.py
  services/
    account_update.py
    account_update_validator.py
    transaction_manager.py
    pseudo_conversational_validator.py
  repositories/
    account_repo.py
    transaction_detail_repo.py
    acct/acpt_persistence.py  ← 9.8 KB
  tests/
    test_batch_performance.py
    test_batch_layout.py
    test_batch_rollback.py
    test_scheduler_cron.py
    test_decimal_precision.py
  app/routes/auth.py
  app/schemas/auth.py
  alembic/versions/  ← 3 migrations
  scripts/
    reconcile_outputs.py  ← 12 KB
  ... 52 files total

20 Lines to Get Started

Define agents, set a task, run. Memory-first by default.

# pip install agentazall
from azclaw import Agent, Orchestrator

endpoint = "http://localhost:8080/v1/chat/completions"

architect = Agent("architect",
    role="Design the solution. Be specific about file structure and APIs.",
    endpoint=endpoint)

developer = Agent("developer",
    role="Write the code. Follow the Architect's design exactly.",
    endpoint=endpoint, can_write=True)

reviewer  = Agent("reviewer",
    role="Review for bugs, security issues, and design violations.",
    endpoint=endpoint)

orch = Orchestrator(agents=[architect, developer, reviewer])
orch.set_task("Build a FastAPI REST API for a todo app with SQLite")
orch.run(max_rounds=30)

print(f"Files: {orch.stats.files_written}")
print(f"Memories: {orch.stats.memories_stored}")

Works with any OpenAI-compatible endpoint: llama.cpp, vLLM, Ollama, LM Studio, OpenRouter. No openai SDK required — pure stdlib urllib.

Per-Agent Breakdown

From a single uninterrupted 8h 46m run on 2026-03-17/18

Agent Tokens Avg Speed Tool Calls Memories
Architect 1,048,820 97 tok/s 379 77
Developer 1,267,555 137 tok/s 606 93
Reviewer 1,194,531 104 tok/s 614 154

The Context Window Is Not Memory

Install

pip install agentazall

# Store a memory
agentazall remember --text "PostgreSQL chosen" --title "db-choice"

# Recall it tomorrow
agentazall recall --query "database"

Zero dependencies. Works offline. Memories are text files you own.

Verify

# Download the 9-hour run results
curl -O https://agentazall.ai/experiments/\
carddemo-cobol-migration/\
carddemo-agentazall-results.zip

# 52 Python files, 402 memories, full log
unzip carddemo-agentazall-results.zip

Every number on this page comes from a single uninterrupted run. Verify it yourself.