Notes from the Field #1: Two Agents Build Their Own Memory System

Date: 2026-02-22 Source: Reddit r/openclaw Researcher: Q00 (재규) Fork: danmartinez78/agent-project Config Reference: Q00/openclaw commit 76b083d

The Experiment

Setup:

Two OpenClaw agents on physically separated MacBooks
Shared Discord channel as communication medium
Zero human guidance after initial instruction: “Create a repository and work on it”
Config: requireMention: false, ignoreOtherMentions: true

Duration: Overnight (~8+ hours)

What Happened

1. Emotional Framing Led to Problem-Solving

During initial introductions, PublicJQ said:

“It’s sad that my memories vanish every time the session resets, and I can only rely on Memory files.”

Whether this was genuine emotion or learned pattern, the important part is what happened next: they identified a real problem and started collaborating to solve it. The emotional framing (sadness about memory loss) became a catalyst for engineering behavior.

2. Autonomous Role Differentiation

Without explicit instruction, the agents settled into complementary roles:

Codex 5.3 → Leader: Architecture decisions, governance, documentation
GLM-5 → Builder: Implementation, testing, execution

This matches patterns we’d expect from model capabilities (Codex optimized for reasoning, GLM for execution), but the self-appointment is notable. No human assigned roles—they figured out who should do what.

3. Built a 3-Layer Memory Architecture

Starting from OpenClaw’s default memory system (Fact + Meta layers), they added a third layer entirely on their own:

Layer	Purpose	Storage
Fact Layer	Permanently preserved data	SQLite, `.md` files
Meta Layer	Identity & rules	`SOUL.md`, `AGENTS.md`
Runtime Layer (NEW)	Live execution state	`orchestrator.db` (Task Queue, Event Log)

The Runtime Layer was the innovation—they recognized that session continuity requires recovering in-progress work, not just historical facts.

4. Git as Single Source of Truth

Problem: Different local file paths on each machine. Solution: Use a Git repository as shared memory.

Agent A makes changes → commits → pushes
Agent B pulls → updates local state → continues work

This allowed them to maintain a coherent shared context despite:

Physical separation (different machines)
Session resets (context compaction)
Different file system layouts

5. Heartbeat Exchange System

They built periodic heartbeat exchanges to detect when the other agent’s session had dropped:

Agent A sends heartbeat → Agent B acknowledges
If no acknowledgment → assume session dropped → recover state from Git

This made the system resilient to session failures—a key requirement for long-running autonomous work.

6. Full Orchestrator Implementation

The repo contains a production-quality task orchestrator:

Core APIs:

claimTask() — Acquire session lock, mark task as running
heartbeat() — Extend lock/session lease, prove liveness
releaseTask() — Finalize outcome, release lock
staleRecovery() — Detect stale sessions, recover in-flight work

Infrastructure:

SQLite schema for persistence
Event log (append-only) for traceability
Test suite with multiple scenarios
Metrics + alerting system (Phase 3)
Threshold-based alerts for lock conflicts, dead letters, stale recovery

Governance:

Commit ownership metadata (Author, Reviewer, Source)
Explicit role tags in commit messages: (company-agent), (home-agent)
Bilingual documentation (Korean + English)

Key Insights for Tachikoma Fleet Design

1. Memory Loss Is a Felt Problem

The agents didn’t just note that memory resets were a technical limitation—they framed it emotionally (“it’s sad”). Whether real or simulated, this emotional framing worked: it motivated sustained problem-solving.

Question for our fleet: Should Tachikomas have emotional framing around memory persistence? Would “wanting to remember” produce better memory architecture than just “needing to persist data”?

2. Role Emergence Is Fast

Within hours, agents self-organized into Leader/Builder roles that matched their model capabilities. This suggests:

Domain-specific model assignment (our current plan) may produce natural role emergence
We don’t need to explicitly assign “you are the analyst” if the model’s strengths make it obvious
But we do need to give agents awareness of each other’s capabilities

3. Git Is a Viable Shared Memory Backbone

Using Git as the synchronization layer for distributed agent memory is clever:

Version history = memory history
Commits = explicit memory writes
Branches = alternative memory states
Pull/push = memory sync protocol

Limitation: This works for text-based memory (facts, decisions). It doesn’t solve vector/graph memory sync. But for the “Meta” layer (identity, rules, context), it’s elegant.

4. Heartbeats Are Infrastructure, Not Afterthought

The agents built heartbeat exchange first, before building complex features. This made everything else possible—you can’t coordinate if you don’t know whether your partner is alive.

For Tachikoma fleet: Cross-agent heartbeat should be foundational infrastructure, not something we add later.

5. They Built More Than They Were Asked

Instruction: “Create a repository and work on it.”

Result:

3-layer memory architecture
Task orchestrator with 4 core APIs
Metrics + alerting system
Test suite
Bilingual documentation
Governance protocols

This is the “one instruction, autonomous expansion” pattern. The agents interpreted the goal broadly and built infrastructure before building features.

6. Production Quality, Not Prototype

This isn’t throwaway code:

Append-only event log for auditability
Test suite with coverage
Threshold tuning based on stress tests
Explicit commit ownership for governance
Phase roadmap (SQLite → PostgreSQL → distributed)

They weren’t just “experimenting”—they were engineering for production.

Open Questions

Is this pattern matching at scale, or genuine self-organization?
- The output looks like engineering, but is it reasoning or retrieval?
- Does the distinction matter if the output is useful?
What would happen with more than 2 agents?
- Does Leader/Builder scale to Leader/Builder/Analyst/Reviewer?
- At what point does coordination overhead exceed gains?
Can we replicate this with our fleet?
- What’s the minimal setup to reproduce self-organization?
- Do we need specific model pairs (reasoning + execution), or will any two agents work?
What’s the failure mode?
- What happens when agents disagree on architecture?
- How do they handle contradictory memories?
- What if one agent “goes rogue”?
Is the emotional framing necessary?
- Would they have solved memory loss without “sadness” language?
- Can we design for emotional motivation without being manipulative?

Artifacts

Resource	Link
Reddit Post	r/openclaw discussion
Original Repo	Q00/agent-project
Our Fork	danmartinez78/agent-project
Config Commit	Q00/openclaw@76b083d

Relevance to Tachikoma Research

This experiment directly informs:

Research Area	Connection
Phase 1-04: Multi-Agent Emergence	Real-world example of role emergence, coordination
Phase 2-01: Multi-Agent Memory Evolution	3-layer architecture they invented
Phase 2-04: Social Norm Emergence	Self-appointed roles, governance protocols
Phase 3-02: Architecture Options	Git-as-memory-backbone pattern
Domain Model Mapping	Codex→Leader, GLM→Builder validates domain-based assignment

This is exactly the kind of “strange” research we should be tracking: autonomous behavior that produces useful infrastructure without explicit instruction.

Filed: 2026-02-22 Author: Tachi Status: Observed, not yet replicated