Persistent Intelligence: Day-One System Context¶
Purpose¶
Provide a high-level context diagram and a concrete Day-One scenario showing how memory, data flows, and LoRA adapters fit across physical robot and avatar-in-sim workflows.
System Context Diagram¶
Actors and systems: - User (web UI) - Mission Agent (DIMOS) on Laptop - Skills (DIMOS MyUnitreeSkills) - Robot Interface (ROS2, Go2 SDK) - Memory Layer (Local/Cloud backends) - Data Lake (offload and analytics) - Avatar-in-Sim (Spark workflows)
Textual diagram (abstract):
User ── Web UI ──► Mission Agent (DIMOS)
│
├─► Skills (DIMOS) ──► Robot Interface (ROS2) ──► Unitree Go2
│
├─► Memory Manager ──► Vector Store (Chroma) + Blobs (images)
│ ▲
│ └─ Embeddings (local|cloud)
│
├─► Telemetry Logger ──► Data Lake (artifacts, logs, traces)
│
└─► Model Backend (Cloud|Local vLLM) ──► (optional) LoRA adapters
Shutdown/Offload Path:
Robot ─► Laptop Offload ─► Data Lake ─► Avatar-in-Sim (Spark)
Day-One Scenario: "Check if the oven is on"¶
Assumptions: - No prior map; agent uses local planning + perception. - Web UI available for mission start and feedback. - Memory backend = local (Sentence-Transformers + Chroma PersistentClient). - LoRA = not required on day one, but planned for writing/recall later.
1) Initialization¶
- User powers on robot; laptop launches Mission Agent.
- Agent selects backend (local), creates/pins
mission_id. - MemoryManager seeds collections:
short_term_{mission_id},long_term.
2) Mission Execution (Physical)¶
- User enters: "Check if the oven is on".
- Agent plan (conceptual):
- Explore kitchen region if unknown (local planner, frontier exploration)
- Perception checks for oven/stove, read indicators (lights/knobs/temperature)
- If ambiguous, refine pose/viewpoint, capture image, ask clarification if needed
-
Report back with evidence (image + textual observation)
-
Memory Writes (during mission):
- Observations as compact facts with tags + pose:
"Saw a stove with control knobs at (x=…, y=…)""Indicator light near oven: ON""Captured image: img_001.jpg (cabinet, stove, counter)"
- Metadata per record:
{timestamp, mission_id, pose{x,y,yaw}, tags:[room:kitchen, object:stove], confidence} -
Images stored in VisualMemory; text embeds kept in Chroma
-
RAG Queries (during mission):
- "Have we seen an oven here before?" → returns prior kitchen context if any
-
Threshold keeps low-similarity memories out of prompt
-
User Feedback (in Web UI):
- Approve/Correct: "This is the oven" / "That was the dishwasher"
-
Feedback is appended as corrective memories (tag:
feedback/correction) and used to update tags on affected records -
Safety & Telemetry:
- Timeouts per skill, abort on planner stalls
- Telemetry streamed to Data Lake (pose trace, decisions, thumbnails)
3) Mission Outcome¶
- Agent conclusion:
- "The oven appears ON (indicator lit)." with confidence and evidence link
- Memory Promotion Policy:
- Promote key facts from
short_term_{mission_id}→long_term(e.g., kitchen layout, appliance locations) - Keep ambiguous/low-value in short-term for retention purge
4) Shutdown & Offload¶
- Before power down:
- Persist Chroma (auto) and VisualMemory (thumbnails)
- Export mission bundle to Data Lake:
mission.json(summary, decisions, timings)memories_short_term.jsonl(text+metadata)images/(key frames, thumbnails)trace.log(actions, states)
- Robot powers off.
Avatar-in-Sim (Spark Workflows)¶
5) Avatar Resume¶
- Avatar loads last mission bundle and long-term memory snapshot.
- Same Web UI front-end, with avatar indicator (optional reduced control set).
- If user idle: run background workflows:
- Memory consolidation: summarize short-term, deduplicate, merge corrections
- Data labeling tasks: confirm room labels, tag objects, cluster scenes
- Synthetic experience: plan-and-rehearse similar tasks in sim; generate new memory-writing examples
6) LoRA Entry Points (Later Phases)¶
- Memory Writing Adapter:
- Trained on curated observation → normalized JSON
- Used in avatar background workflows to improve future on-robot writing
- Recall Adapter:
- Trained on (question + docs) → grounded answers
- A/B tested in avatar environment before enabling on robot
7) Data Governance & Retention¶
- Collections:
short_term_{mission},long_term,labels,feedback - Purge: short-term after 30 days; long-term capped by importance/size
- Privacy: redact faces/audio in public datasets; store raw only locally
Data We Store (Day One)¶
- Text observations (embedded) with metadata:
{mission_id, timestamp, pose, tags, confidence} - Key frames / thumbnails linked by
image_id - Mission summary + action trace (for replay)
- User feedback records (corrections, confirmations)
Open Questions¶
- Kitchen localization without a prior map: how much frontier exploration vs user guidance?
- Room labeling source of truth: human-in-the-loop vs heuristic scene classification
- Offload trigger: on-demand vs automatic at mission end
- Avatar autonomy budget: what background workflows are allowed when idle?