Persistent Intelligence: Day-One System Context¶

Purpose¶

Provide a high-level context diagram and a concrete Day-One scenario showing how memory, data flows, and LoRA adapters fit across physical robot and avatar-in-sim workflows.

System Context Diagram¶

Actors and systems: - User (web UI) - Mission Agent (DIMOS) on Laptop - Skills (DIMOS MyUnitreeSkills) - Robot Interface (ROS2, Go2 SDK) - Memory Layer (Local/Cloud backends) - Data Lake (offload and analytics) - Avatar-in-Sim (Spark workflows)

Textual diagram (abstract):

User ── Web UI ──► Mission Agent (DIMOS)
                     │
                     ├─► Skills (DIMOS) ──► Robot Interface (ROS2) ──► Unitree Go2
                     │
                     ├─► Memory Manager ──► Vector Store (Chroma) + Blobs (images)
                     │                         ▲
                     │                         └─ Embeddings (local|cloud)
                     │
                     ├─► Telemetry Logger ──► Data Lake (artifacts, logs, traces)
                     │
                     └─► Model Backend (Cloud|Local vLLM) ──► (optional) LoRA adapters

Shutdown/Offload Path:
Robot ─► Laptop Offload ─► Data Lake ─► Avatar-in-Sim (Spark)

Day-One Scenario: "Check if the oven is on"¶

Assumptions: - No prior map; agent uses local planning + perception. - Web UI available for mission start and feedback. - Memory backend = local (Sentence-Transformers + Chroma PersistentClient). - LoRA = not required on day one, but planned for writing/recall later.

1) Initialization¶

User powers on robot; laptop launches Mission Agent.
Agent selects backend (local), creates/pins mission_id.
MemoryManager seeds collections: short_term_{mission_id}, long_term.

2) Mission Execution (Physical)¶

User enters: "Check if the oven is on".
Agent plan (conceptual):
Explore kitchen region if unknown (local planner, frontier exploration)
Perception checks for oven/stove, read indicators (lights/knobs/temperature)
If ambiguous, refine pose/viewpoint, capture image, ask clarification if needed
Report back with evidence (image + textual observation)
Memory Writes (during mission):
Observations as compact facts with tags + pose:
- "Saw a stove with control knobs at (x=…, y=…)"
- "Indicator light near oven: ON"
- "Captured image: img_001.jpg (cabinet, stove, counter)"
Metadata per record: {timestamp, mission_id, pose{x,y,yaw}, tags:[room:kitchen, object:stove], confidence}
Images stored in VisualMemory; text embeds kept in Chroma
RAG Queries (during mission):
"Have we seen an oven here before?" → returns prior kitchen context if any
Threshold keeps low-similarity memories out of prompt
User Feedback (in Web UI):
Approve/Correct: "This is the oven" / "That was the dishwasher"
Feedback is appended as corrective memories (tag: feedback/correction) and used to update tags on affected records
Safety & Telemetry:
Timeouts per skill, abort on planner stalls
Telemetry streamed to Data Lake (pose trace, decisions, thumbnails)

3) Mission Outcome¶

Agent conclusion:
"The oven appears ON (indicator lit)." with confidence and evidence link
Memory Promotion Policy:
Promote key facts from short_term_{mission_id} → long_term (e.g., kitchen layout, appliance locations)
Keep ambiguous/low-value in short-term for retention purge

4) Shutdown & Offload¶

Before power down:
Persist Chroma (auto) and VisualMemory (thumbnails)
Export mission bundle to Data Lake:
- mission.json (summary, decisions, timings)
- memories_short_term.jsonl (text+metadata)
- images/ (key frames, thumbnails)
- trace.log (actions, states)
Robot powers off.

Avatar-in-Sim (Spark Workflows)¶

5) Avatar Resume¶

Avatar loads last mission bundle and long-term memory snapshot.
Same Web UI front-end, with avatar indicator (optional reduced control set).
If user idle: run background workflows:
Memory consolidation: summarize short-term, deduplicate, merge corrections
Data labeling tasks: confirm room labels, tag objects, cluster scenes
Synthetic experience: plan-and-rehearse similar tasks in sim; generate new memory-writing examples

6) LoRA Entry Points (Later Phases)¶

Memory Writing Adapter:
Trained on curated observation → normalized JSON
Used in avatar background workflows to improve future on-robot writing
Recall Adapter:
Trained on (question + docs) → grounded answers
A/B tested in avatar environment before enabling on robot

7) Data Governance & Retention¶

Collections: short_term_{mission}, long_term, labels, feedback
Purge: short-term after 30 days; long-term capped by importance/size
Privacy: redact faces/audio in public datasets; store raw only locally

Data We Store (Day One)¶

Text observations (embedded) with metadata: {mission_id, timestamp, pose, tags, confidence}
Key frames / thumbnails linked by image_id
Mission summary + action trace (for replay)
User feedback records (corrections, confirmations)

Open Questions¶

Kitchen localization without a prior map: how much frontier exploration vs user guidance?
Room labeling source of truth: human-in-the-loop vs heuristic scene classification
Offload trigger: on-demand vs automatic at mission end
Avatar autonomy budget: what background workflows are allowed when idle?