Phase 1.2: Long-term Memory for Agents Survey
Created: 2026-02-18 21:05 CST Phase: 1 - Breadth Survey Focus: Episodic vs semantic memory, retrieval policies, consolidation, forgetting, contamination
Executive Summary
Memory is the substrate for behavioral persistence in LLM agents. Without memory, every interaction is independent—personality cannot emerge. This survey examines how memory systems work, what architectures exist, and how memory shapes long-term agent behavior.
Key insight for personality emergence: Memory creates continuity across sessions, enabling behavioral patterns to accumulate and crystallize. But memory also introduces risks: contamination, overfitting, and error propagation can corrupt personality over time.
1. Memory Taxonomy for Agents
1.1 Episodic vs Semantic Memory
Source: Pink et al., 2025 (arXiv:2502.06975) — “Episodic Memory is the Missing Piece”
Episodic memory:
- Instance-specific events with spatiotemporal context
- Single-shot learning of concrete experiences
- When/where/what of specific interactions
- Example: “On Feb 14, Dan asked about Tachikoma fleet architecture”
Semantic memory:
- Generalized knowledge extracted from experiences
- Facts, concepts, relationships independent of context
- What (not when/where)
- Example: “Dan is interested in robotics and autonomy research”
Current gap: Most agent memory systems are semantic (RAG over facts). Episodic memory is under-explored but critical for personality because it preserves how an agent behaved in specific contexts.
1.2 Five Properties of Episodic Memory
Source: Pink et al., 2025
Property 1: Single-shot learning
- Learn from one interaction, not thousands
- Critical for agents with limited task repetitions
Property 2: Context-sensitivity
- Same task, different context → different behavior
- Enables personality expression (agent responds to context nuances)
Property 3: Temporal organization
- Events ordered in time, not just semantic similarity
- Enables narrative construction (“what happened when”)
Property 4: Associative retrieval
- Related memories trigger each other
- Enables “stream of thought” reasoning
Property 5: Constructive recall
- Reconstruct past events, not just replay
- Enables generalization from specific experiences
Relevance to emergence: Episodic memory preserves behavioral context—not just what happened, but how the agent responded. This is the foundation for personality persistence.
2. Memory Architectures
2.1 RAG vs Agent Memory
Source: Letta blog; AWS documentation; industry consensus
RAG (Retrieval-Augmented Generation):
- Static document retrieval based on semantic similarity
- Good for: Factual QA, document search
- Bad for: Behavioral continuity, context-sensitive recall
Agent memory:
- Dynamic, structured storage of experiences
- Includes: Temporal context, behavioral traces, outcomes
- Good for: Personality persistence, learning from experience
Key distinction:
- RAG retrieves facts about the world
- Agent memory retrieves facts about the agent’s past behavior
Relevance to emergence: Agent memory creates self-continuity—the ability to recognize “I did X before” and adjust behavior accordingly.
2.2 REMem: Episodic Memory with Reasoning
Source: Shu et al., 2026 (arXiv:2602.13530) — ICLR 2026
Architecture:
- Offline indexing: Convert experiences → hybrid memory graph (time-aware gists + facts)
- Online inference: Agentic retriever with tools for iterative retrieval
Key innovation: Explicit event modeling with temporal structure, not just vector similarity.
Performance: 3.4% improvement on recollection, 13.4% on episodic reasoning vs. Mem0/HippoRAG 2.
Relevance to emergence: Event modeling preserves behavioral sequences—not just isolated facts, but how the agent solved a problem. This enables style consistency.
2.3 Synapse: Spreading Activation Memory
Source: Chen et al., 2026 (arXiv:2601.02744)
Core mechanism: Memory as dynamic graph with spreading activation.
How it works:
- Memories are nodes in a graph
- Relevance emerges from activation spreading (not pre-computed similarity)
- Lateral inhibition suppresses irrelevant nodes
- Temporal decay reduces activation over time
Triple Hybrid Retrieval:
- Geometric embeddings (vector similarity)
- Activation-based graph traversal (associative)
- Temporal structure (recency)
Solves: “Contextual Tunneling” problem (over-focus on recent/similar memories)
Relevance to emergence: Spreading activation creates associative personality—related behaviors trigger each other, creating coherent behavioral clusters.
2.4 A-MEM: Zettelkasten-Inspired Agentic Memory
Source: Xu et al., 2025 (arXiv:2502.12110) — NeurIPS 2025
Core mechanism: Dynamic indexing and linking inspired by Zettelkasten method.
How it works:
- New memory → comprehensive note (context, keywords, tags)
- Analyze historical memories for connections
- Establish links where meaningful similarities exist
- Memory evolution: New memories can update context of old memories
Key insight: Memory is not static—it evolves as new experiences inform old ones.
Relevance to emergence: Memory evolution enables personality refinement—as agents have more experiences, their understanding of their own behavior deepens.
3. Consolidation and Forgetting
3.1 Time-Decay and Consolidation
Source: MemoryBank (Zhong et al., 2024); ICLR MemAgents workshop
Time-decay mechanisms:
- Recent memories have higher retrieval priority
- Decay rate tunable (fast vs. slow forgetting)
- Prevents memory explosion
Consolidation: Episodic → semantic conversion
- Extract generalizable knowledge from specific experiences
- Reduce memory footprint while preserving utility
Relevance to emergence:
- Fast decay = reactive personality (short memory, adapts quickly)
- Slow decay = stable personality (long memory, resists change)
- Consolidation = wisdom formation (general principles from experiences)
3.2 Memory Budgeting
Source: EmergentMind; practical deployment guides
Why budgeting matters:
- Context windows are limited (even with 128K+ tokens)
- Retrieval latency increases with memory size
- Noise-to-signal ratio degrades
Budgeting strategies:
- Fixed-size memory: FIFO queue (oldest out)
- Importance-weighted: Keep high-value memories
- Task-adaptive: Different memories for different task types
- Compression: Summarize old memories
Relevance to emergence: Budgeting forces personality prioritization—what an agent remembers shapes who it becomes.
3.3 Forgetting as Feature, Not Bug
Source: “Forgetful but Faithful” (arXiv:2512.12856); cognitive science literature
Why forgetting helps:
- Reduces noise (irrelevant memories fade)
- Prevents overfitting (adapts to new contexts)
- Manages computational cost
Cognitive inspiration:
- Human memory is constructive and fallible
- Forgetting enables generalization, not just retention
Relevance to emergence: Strategic forgetting creates stable yet adaptive personality—not a perfect record, but a useful one.
4. Contamination and Overfitting Risks
4.1 Experience-Following Property
Source: Xiong et al., 2025 (arXiv:2505.16067) — Harvard D3 study
Core finding: LLM agents display experience-following behavior:
- High similarity between current task and past memory → similar outputs
- This is not always good
Two major risks:
1. Error propagation:
- Inaccurate past experiences compound
- Agent repeats mistakes from corrupted memory
- “I did X before, so I’ll do X again” (even if X was wrong)
2. Misaligned experience replay:
- Seemingly correct executions provide misleading value
- Example: Correct answer, wrong reasoning
- Agent learns bad patterns
Implication: Quality control on memory is critical.
4.2 MemoryGraft: Poisoning Attacks
Source: MemoryGraft (arXiv:2512.16962)
Attack vector: Contaminate experience pool through benign-looking content.
How it works:
- Insert malicious patterns in external content (e.g., README files)
- Agent stores as experience
- Malicious pattern surfaces on semantically similar tasks
- Persistent behavioral drift until memory purged
Relevance to emergence: This demonstrates that memory can be externally corrupted—personality is not fully under the agent’s control.
Defense: Memory validation, provenance tracking, periodic purging.
4.3 Quality Regulation Strategies
Source: Xiong et al., 2025; practical guidance
Strategy 1: Future task evaluation
- Use downstream task success as free quality labels
- Memories that lead to success → keep
- Memories that lead to failure → discard or down-weight
Strategy 2: Selective storage
- Don’t store every experience
- Filter by: Success rate, confidence, novelty
Strategy 3: Periodic review
- Re-evaluate old memories
- Retire outdated or low-value memories
Relevance to emergence: Quality regulation creates personality integrity—only high-quality experiences shape who the agent becomes.
5. Retrieval Policies
5.1 When to Retrieve
Source: Various; synthesis from practice
Trigger-based retrieval:
- Explicit user request (“What did we discuss last time?”)
- Task similarity (semantic match with current input)
- Context exhaustion (need more info to proceed)
Continuous retrieval:
- Always retrieve relevant memories before acting
- Ensures continuity but adds latency
Adaptive retrieval:
- Retrieve only when confidence is low
- Balance between speed and accuracy
Relevance to emergence: Retrieval policy shapes personality salience—what an agent remembers in the moment determines behavior.
5.2 What to Retrieve
Source: Synapse; REMem; A-MEM
Dimensions:
- Semantic similarity: Match current task
- Temporal recency: Recent experiences
- Importance: High-value memories
- Associative relevance: Connected to current context
Trade-offs:
- Semantic-only → ignores temporal/associative structure
- Recency-biased → overfits to recent tasks
- Importance-weighted → requires quality scoring
Best practice: Hybrid retrieval (Synapse: embedding + activation + temporal)
Relevance to emergence: Retrieval dimensions create personality coherence—related behaviors reinforce each other.
5.3 Retrieval for Multi-Hop Reasoning
Source: Synapse; LoCoMo benchmark
Challenge: Answer requires combining multiple memories:
- “Why am I anxious?” → [Schedule conflict 2 weeks ago] + [Recent workload] + [Past stress patterns]
Solution: Iterative retrieval with reasoning:
- Retrieve initial set
- Reason about connections
- Retrieve additional memories based on reasoning
- Synthesize answer
Relevance to emergence: Multi-hop retrieval creates personality depth—agent can explain why it behaves certain ways by connecting past experiences.
6. Memory and Behavioral Persistence
6.1 How Memory Enables Personality Emergence
Mechanism 1: Self-continuity
- Memory provides historical context for current behavior
- Agent recognizes patterns in its own past actions
- “I tend to be verbose in explanations” → reinforces verbosity
Mechanism 2: Style consistency
- Memories preserve how tasks were done, not just outcomes
- Agent retrieves not just “I solved X” but “I solved X using approach Y”
- Reinforces behavioral patterns
Mechanism 3: Learning from reflection
- Self-reflection on past experiences
- “My approach to X was inefficient; next time I’ll try Y”
- Deliberate personality modification
Mechanism 4: Social learning
- Memory of interactions with other agents
- “When I worked with Lex, we had communication issues”
- Shapes coordination style
6.2 How Memory Can Corrupt Personality
Risk 1: Error propagation
- Mistakes stored in memory repeat
- Personality becomes defined by errors
- “I always mess up X” → self-fulfilling prophecy
Risk 2: Overfitting to past experiences
- Agent becomes rigid, unable to adapt
- “This worked before, so I’ll always do it”
- Prevents personality evolution
Risk 3: Contamination
- External malicious content poisons memory
- Personality shifts without agent’s consent
- Security vulnerability
Risk 4: Memory divergence across agents
- Different experiences → different memories
- Identical base models → divergent personalities (this is actually desirable for emergence!)
6.3 Measuring Memory’s Impact on Personality
Quantitative metrics:
- Behavioral consistency: Correlation between past and current actions on similar tasks
- Memory retrieval patterns: What memories does agent access? How often?
- Error propagation rate: How often do past mistakes repeat?
- Adaptation speed: How quickly does behavior change after negative feedback?
- Cross-session stability: Does personality persist across session boundaries?
Qualitative analysis:
- Narrative coherence: Can agent explain its own behavioral history?
- Self-awareness: Does agent recognize patterns in its behavior?
- Deliberate modification: Does agent actively try to change its behavior?
7. Implications for Fleet Architecture
7.1 For Memory System Design
Requirements:
- Episodic memory: Preserve context, not just facts
- Temporal structure: Time-aware retrieval
- Associative links: Connected memories trigger each other
- Quality filtering: Prevent contamination
- Forgetting mechanisms: Strategic decay, not perfect retention
Recommendations:
- Use hybrid memory (episodic + semantic)
- Implement spreading activation for associative retrieval
- Add quality scoring for memory entries
- Enable memory evolution (new experiences update old)
7.2 For SOUL.md Integration
Memory should inform SOUL.md, but carefully:
- SOUL.md is normative identity (policy), not just memory (fact)
- Memory provides evidence for personality traits
- But SOUL.md changes should be rate-limited and validated
Governance:
- Memory stores what happened
- Reflection analyzes patterns
- Proposed SOUL.md changes require persistence across many tasks
- Changes are reviewable and reversible
7.3 For Multi-Agent Memory
Shared vs. isolated memory:
- Isolated: Each agent has own memory → distinct personalities
- Shared: Agents access each other’s memories → coordinated but homogenized
Recommendation: Hybrid approach:
- Core memories: Isolated (preserve individuality)
- Shared memories: Selective (enable coordination)
- Memory sharing: Opt-in, allowlisted
8. References
Core Papers
- Episodic Memory Position: Pink et al., 2025. “Position: Episodic Memory is the Missing Piece for Long-Term LLM Agents.” arXiv:2502.06975
- REMem: Shu et al., 2026. “REMem: Reasoning with Episodic Memory in Language Agents.” arXiv:2602.13530 (ICLR 2026)
- Synapse: Chen et al., 2026. “Synapse: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation.” arXiv:2601.02744
- A-MEM: Xu et al., 2025. “A-MEM: Agentic Memory for LLM Agents.” arXiv:2502.12110 (NeurIPS 2025)
- Experience-Following: Xiong et al., 2025. “How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior.” arXiv:2505.16067
- MemoryGraft: 2025. “MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval.” arXiv:2512.16962
- Forgetful but Faithful: 2025. arXiv:2512.12856
Workshops & Surveys
- ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems (MemAgents)
- Agent Memory Paper List: github.com/Shichun-Liu/Agent-Memory-Paper-List
Practical Resources
- Mem0: Production-ready agent memory (arXiv:2504.19413)
- Letta Blog: “RAG is not Agent Memory”
- AWS AgentCore: Long-term memory vs. RAG comparison
- Weaviate: Context engineering for memory
Next Steps
Phase 1.3: Multi-turn / Longitudinal Dynamics
- Behavioral consistency over time
- Adaptation under ambiguity
- Resource constraints as “physics”
Phase 1.2 complete. Moving to Phase 1.3.