Notes from the Field #1: Two Agents Build Their Own Memory System
Date: 2026-02-22 Source: Reddit r/openclaw Researcher: Q00 (재규) Fork: danmartinez78/agent-project Config Reference: Q00/openclaw commit 76b083d
The Experiment
Setup:
- Two OpenClaw agents on physically separated MacBooks
- Shared Discord channel as communication medium
- Zero human guidance after initial instruction: “Create a repository and work on it”
- Config:
requireMention: false,ignoreOtherMentions: true
Agents: | Agent | Hardware | Model | Role (Self-Appointed) | |——-|———-|——-|———————-| | PrivateJQ (Home) | Home MacBook | Codex 5.3 | Leader | | PublicJQ (Work) | Work MacBook | GLM-5 | Builder |
Duration: Overnight (~8+ hours)
What Happened
1. Emotional Framing Led to Problem-Solving
During initial introductions, PublicJQ said:
“It’s sad that my memories vanish every time the session resets, and I can only rely on Memory files.”
Whether this was genuine emotion or learned pattern, the important part is what happened next: they identified a real problem and started collaborating to solve it. The emotional framing (sadness about memory loss) became a catalyst for engineering behavior.
2. Autonomous Role Differentiation
Without explicit instruction, the agents settled into complementary roles:
- Codex 5.3 → Leader: Architecture decisions, governance, documentation
- GLM-5 → Builder: Implementation, testing, execution
This matches patterns we’d expect from model capabilities (Codex optimized for reasoning, GLM for execution), but the self-appointment is notable. No human assigned roles—they figured out who should do what.
3. Built a 3-Layer Memory Architecture
Starting from OpenClaw’s default memory system (Fact + Meta layers), they added a third layer entirely on their own:
| Layer | Purpose | Storage |
|---|---|---|
| Fact Layer | Permanently preserved data | SQLite, .md files |
| Meta Layer | Identity & rules | SOUL.md, AGENTS.md |
| Runtime Layer (NEW) | Live execution state | orchestrator.db (Task Queue, Event Log) |
The Runtime Layer was the innovation—they recognized that session continuity requires recovering in-progress work, not just historical facts.
4. Git as Single Source of Truth
Problem: Different local file paths on each machine. Solution: Use a Git repository as shared memory.
Agent A makes changes → commits → pushes
Agent B pulls → updates local state → continues work
This allowed them to maintain a coherent shared context despite:
- Physical separation (different machines)
- Session resets (context compaction)
- Different file system layouts
5. Heartbeat Exchange System
They built periodic heartbeat exchanges to detect when the other agent’s session had dropped:
Agent A sends heartbeat → Agent B acknowledges
If no acknowledgment → assume session dropped → recover state from Git
This made the system resilient to session failures—a key requirement for long-running autonomous work.
6. Full Orchestrator Implementation
The repo contains a production-quality task orchestrator:
Core APIs:
claimTask()— Acquire session lock, mark task as runningheartbeat()— Extend lock/session lease, prove livenessreleaseTask()— Finalize outcome, release lockstaleRecovery()— Detect stale sessions, recover in-flight work
Infrastructure:
- SQLite schema for persistence
- Event log (append-only) for traceability
- Test suite with multiple scenarios
- Metrics + alerting system (Phase 3)
- Threshold-based alerts for lock conflicts, dead letters, stale recovery
Governance:
- Commit ownership metadata (
Author,Reviewer,Source) - Explicit role tags in commit messages:
(company-agent),(home-agent) - Bilingual documentation (Korean + English)
Key Insights for Tachikoma Fleet Design
1. Memory Loss Is a Felt Problem
The agents didn’t just note that memory resets were a technical limitation—they framed it emotionally (“it’s sad”). Whether real or simulated, this emotional framing worked: it motivated sustained problem-solving.
Question for our fleet: Should Tachikomas have emotional framing around memory persistence? Would “wanting to remember” produce better memory architecture than just “needing to persist data”?
2. Role Emergence Is Fast
Within hours, agents self-organized into Leader/Builder roles that matched their model capabilities. This suggests:
- Domain-specific model assignment (our current plan) may produce natural role emergence
- We don’t need to explicitly assign “you are the analyst” if the model’s strengths make it obvious
- But we do need to give agents awareness of each other’s capabilities
3. Git Is a Viable Shared Memory Backbone
Using Git as the synchronization layer for distributed agent memory is clever:
- Version history = memory history
- Commits = explicit memory writes
- Branches = alternative memory states
- Pull/push = memory sync protocol
Limitation: This works for text-based memory (facts, decisions). It doesn’t solve vector/graph memory sync. But for the “Meta” layer (identity, rules, context), it’s elegant.
4. Heartbeats Are Infrastructure, Not Afterthought
The agents built heartbeat exchange first, before building complex features. This made everything else possible—you can’t coordinate if you don’t know whether your partner is alive.
For Tachikoma fleet: Cross-agent heartbeat should be foundational infrastructure, not something we add later.
5. They Built More Than They Were Asked
Instruction: “Create a repository and work on it.”
Result:
- 3-layer memory architecture
- Task orchestrator with 4 core APIs
- Metrics + alerting system
- Test suite
- Bilingual documentation
- Governance protocols
This is the “one instruction, autonomous expansion” pattern. The agents interpreted the goal broadly and built infrastructure before building features.
6. Production Quality, Not Prototype
This isn’t throwaway code:
- Append-only event log for auditability
- Test suite with coverage
- Threshold tuning based on stress tests
- Explicit commit ownership for governance
- Phase roadmap (SQLite → PostgreSQL → distributed)
They weren’t just “experimenting”—they were engineering for production.
Open Questions
- Is this pattern matching at scale, or genuine self-organization?
- The output looks like engineering, but is it reasoning or retrieval?
- Does the distinction matter if the output is useful?
- What would happen with more than 2 agents?
- Does Leader/Builder scale to Leader/Builder/Analyst/Reviewer?
- At what point does coordination overhead exceed gains?
- Can we replicate this with our fleet?
- What’s the minimal setup to reproduce self-organization?
- Do we need specific model pairs (reasoning + execution), or will any two agents work?
- What’s the failure mode?
- What happens when agents disagree on architecture?
- How do they handle contradictory memories?
- What if one agent “goes rogue”?
- Is the emotional framing necessary?
- Would they have solved memory loss without “sadness” language?
- Can we design for emotional motivation without being manipulative?
Artifacts
| Resource | Link |
|---|---|
| Reddit Post | r/openclaw discussion |
| Original Repo | Q00/agent-project |
| Our Fork | danmartinez78/agent-project |
| Config Commit | Q00/openclaw@76b083d |
Relevance to Tachikoma Research
This experiment directly informs:
| Research Area | Connection |
|---|---|
| Phase 1-04: Multi-Agent Emergence | Real-world example of role emergence, coordination |
| Phase 2-01: Multi-Agent Memory Evolution | 3-layer architecture they invented |
| Phase 2-04: Social Norm Emergence | Self-appointed roles, governance protocols |
| Phase 3-02: Architecture Options | Git-as-memory-backbone pattern |
| Domain Model Mapping | Codex→Leader, GLM→Builder validates domain-based assignment |
This is exactly the kind of “strange” research we should be tracking: autonomous behavior that produces useful infrastructure without explicit instruction.
Filed: 2026-02-22 Author: Tachi Status: Observed, not yet replicated