Phase 1.7: Academic Sources Mining

Created: 2026-02-18 23:40 CST Phase: 1 - Breadth Survey Focus: Recent conference papers (NeurIPS, ICLR, ACL 2024-2026)

Executive Summary

Academic conferences in 2024-2026 show rapidly accelerating research on LLM agents, memory systems, multi-agent coordination, and personality measurement. The field is converging on memory as critical infrastructure for agent behavior, multi-agent systems as a path to higher-order intelligence, and psychometric measurement as a way to quantify personality.

Key trends:

Memory systems proliferating (A-Mem, G-Memory, CAM, hierarchical memory)
Multi-agent coordination scaling up (AgentVerse, MegaAgent, collaboration frameworks)
Personality measurement becoming systematic (NEO-FFI studies, trait stability)
Self-improvement emerging as key capability (reflection-reinforced training)
Governance and safety concerns growing (AgentPoison, red-teaming)

North-star relevance: Academic research provides cutting-edge methods and frameworks for building personality emergence systems—directly applicable to fleet architecture.

1. NeurIPS 2024-2025

1.1 Memory Systems

A-Mem: Agentic Memory for LLM Agents (NeurIPS 2025)

Authors: OpenReview.net/forum?id=FiM0M8gcct
Core contribution: Novel agentic memory system that dynamically organizes memories
Key insight: Current memory systems lack sophisticated organization; A-Mem provides agent-controlled memory structure
Mechanism: Agents can organize, retrieve, and manipulate memory in agentic ways
Relevance: Memory organization = personality crystallization

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems (NeurIPS 2025)

Core contribution: Hierarchical memory for LLM-powered multi-agent systems
Key insight: Multi-agent systems exceed single-agent capabilities, but memory architectures underdeveloped
Mechanism: Hierarchical memory structure for multi-agent coordination
Relevance: Multi-agent memory = shared personality infrastructure

CAM: A Constructivist View of Agentic Memory (NeurIPS 2025)

Core contribution: Cohesive memory module for autonomous reading agents
Key insight: Need memory module to elevate vanilla LLMs into autonomous agents
Mechanism: Constructivist approach to memory (agent builds understanding)
Relevance: Memory construction = personality building

VLM Agents Generate Their Own Memories (NeurIPS 2024)

Core contribution: Vision-language models distill experience into embodied programs of thought
Key insight: Agents can generate their own memories from experience
Mechanism: Experience distillation into structured memory
Relevance: Self-generated memory = personality formation

1.2 Personality and Agent Behavior

Exploring Personality Trait Change of LLM-Based AI Systems (NeurIPS 2025)

Core contribution: Examine personality trait stability across situational contexts
Method: NEO-FFI (NEO Five Factor Inventory) personality inventory
Models tested: Three foundation LLMs + two multi-agent systems
Key finding: Assess ability to maintain consistent personality traits before/after situational contexts
Relevance: Direct measurement of personality stability vs. drift

RoleAgent: Building, Interacting, and Benchmarking (NeurIPS 2024)

Core contribution: Framework for role-playing agents with personality profiles
Key insight: Generative agents rely on human-annotated agent profiles (name, age, personality, relationships)
Mechanism: Profile initialization defines personality
Relevance: Personality initialization = starting point for emergence

1.3 Safety and Governance

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases (NeurIPS 2024)

Core contribution: Red-teaming approach to test memory/knowledge poisoning
Key insight: Memory systems vulnerable to poisoning attacks
Mechanism: Inject malicious content into memory → corrupted behavior
Relevance: Memory security = personality security

2. ICLR 2024-2025

2.1 Multi-Agent Coordination

Scaling Large Language Model-based Multi-Agent Collaboration (ICLR 2025)

Core contribution: Examine impact of scaling LLM agents in multi-agent task solving
Key insight: Extend traditional scaling from training (neuron collaboration) to inference (agent collaboration)
Mechanism: Inference-time thinking replaces resource-intensive retraining
Relevance: Scaling laws for multi-agent systems

AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors (ICLR 2024)

Core contribution: Simple, effective multi-agent collaborative framework
Key insight: Emergent behaviors arise from collaboration
Mechanism: Framework enables agent specialization and coordination
Relevance: Emergent behaviors = personality emergence

Evaluating Multi-Agent Coordination Abilities in Large Language Models (OpenReview)

Core contribution: Evaluate LLM coordination with humans and other systems
Key insight: Coordination is pivotal aim in contemporary AI research
Mechanism: Measure ability to understand, generate, interpret language in coordination
Relevance: Coordination ability = personality dimension

Efficient Human-AI Coordination via Preparatory Language-based Convention (OpenReview)

Core contribution: LLM generates conventions (action plans) before coordination
Key insight: Humans establish conventions pre-coordination → LLM can do same
Mechanism: LLM generates convention based on task requirements, preferences, number of agents
Relevance: Convention formation = personality expression

Multi-Agent Collaboration via Evolving Orchestration (ICLR 2025)

Core contribution: Evolving orchestration mechanisms for multi-agent collaboration
Key insight: Orchestration can evolve over time
Mechanism: Dynamic orchestration based on task demands
Relevance: Evolving coordination = personality evolution

2.2 Agent Architecture

AGENTSQUARE: AUTOMATIC LLM AGENT (ICLR 2025)

Core contribution: Automatic LLM agent design
Key insight: Agents can be automatically designed/architected
Mechanism: Automated search for optimal agent architecture
Relevance: Architecture = personality substrate

2.3 Memory Workshops

MemAgents: Memory for LLM-Based Agentic Systems (ICLR 2026 Workshop Proposal)

Focus: Memory layer that underwrites agent behavior
Scope: Software tools, embodied/robotic tasks, multi-agent settings
Three perspectives:
1. Memory architectures and representations (episodic, semantic, working, parametric)
2. Memory interfaces with external stores
3. Memory for different agent domains
Relevance: Memory = personality foundation

3. ACL 2024-2025

3.1 Multi-Agent Systems

MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent (ACL 2025 Findings)

Core contribution: Large-scale autonomous multi-agent system
Key insight: Scale matters for agent capabilities
Mechanism: Many agents working autonomously
Relevance: Scale → emergent complexity

Creativity in LLM-based Multi-Agent Systems: A Survey (EMNLP 2025)

Core contribution: Survey of creativity in multi-agent systems
Key insight: Multi-agent interaction enhances creativity
Mechanism: Diverse perspectives, collaboration, competition
Relevance: Creativity = personality dimension

3.2 Agent Training and Tuning

Agent-FLAN: Designing Data and Methods of Effective Agent Tuning (ACL 2024 Findings)

Core contribution: Effective methods for tuning LLMs as agents
Key insight: Agent tuning requires specialized data and methods
Mechanism: Careful decomposition of agent tasks
Relevance: Training = personality shaping

3.3 Self-Improvement and Reflection

Reflection-Reinforced Self-Training for Language Agents (EMNLP 2024)

Core contribution: Self-training using reflection ability
Key insight: Reflection can function with/without ground-truth feedback
Mechanism: Agent reflects on own performance → improves
Relevance: Self-reflection = personality evolution mechanism

Unlocking LLMs’ Self-Improvement Capacity with Autonomous Learning (ACL 2025 Findings)

Core contribution: Autonomous learning for domain adaptation
Key insight: LLMs can independently identify and enhance policy for reducing knowledge gaps
Mechanism: Autonomous exploration and improvement
Relevance: Autonomous improvement = self-directed personality evolution

A Self-Referential Agent Framework for Recursively (ACL 2025)

Core contribution: Self-referential agent architecture
Key insight: Agents can be self-referential (reason about themselves)
Mechanism: Recursive self-reference
Relevance: Self-reference = self-modeling

3.4 Reasoning and Problem-Solving

A Streamlined Framework for Enhancing LLM Reasoning (ACL 2025)

Core contribution: Multi-agent reasoning framework
Agents: Web-Search agent, Coding agent, Mind-Map agent
Mechanism: Different agents handle different reasoning aspects
Relevance: Specialization = personality dimension

DeepReview: Improving LLM-based Paper Review (ACL 2025)

Core contribution: Multi-agent paper review system
Key insight: Multi-agent improves review quality
Mechanism: Multiple reviewer agents with different perspectives
Relevance: Multiple perspectives = personality diversity

4. AAMAS (Multi-Agent Systems)

4.1 Emergent Coordination

Emergent Coordination in Multi-Agent LLMs (covered in Phase 1.4)

Conference: AAMAS-related research
Core contribution: Information-theoretic framework for detecting emergence
Key insight: Emergence is measurable via TDMI
Relevance: Emergence measurement = personality emergence measurement

5. CogSci (Cognitive Science)

5.1 Psychological Measurement

Psychometric frameworks for LLMs (covered in Phase 1.6)

Core contribution: Adapt human psychometric tools to LLMs
Key insight: Big Five, STAI, etc. applicable to LLMs
Relevance: Measurement = personality quantification

6. Cross-Conference Themes

6.1 Memory is Critical Infrastructure

Across NeurIPS, ICLR, ACL:

A-Mem: Agentic memory organization
G-Memory: Hierarchical multi-agent memory
CAM: Constructivist memory
VLM Agents: Self-generated memory
MemAgents Workshop: Memory layer for agents

Consensus: Memory is foundational for agent behavior and personality.

Implications:

Memory architecture = personality architecture
Memory organization = personality crystallization
Memory retrieval = personality expression
Memory poisoning = personality corruption

6.2 Multi-Agent Systems Enable Emergence

Across NeurIPS, ICLR, ACL, AAMAS:

AgentVerse: Emergent behaviors from collaboration
Scaling Multi-Agent: Scaling laws for coordination
MegaAgent: Large-scale autonomous systems
Creativity Survey: Emergent creativity
Emergent Coordination: Measurable emergence

Consensus: Multi-agent systems produce emergent behaviors not present in single agents.

Implications:

Fleet = multi-agent system
Emergence from interaction
Specialization from coordination
Personality from social dynamics

6.3 Personality is Measurable

Across NeurIPS, CogSci, psychology research:

NEO-FFI studies: Big Five measurement in LLMs
Psychometric frameworks: Validated measurement tools
Personality trait change: Stability vs. drift
RoleAgent: Profile initialization

Consensus: Personality can be quantified using psychometric tools adapted for LLMs.

Implications:

Personality measurement = Big Five, STAI, custom tools
Stability measurement = test-retest reliability
Drift detection = longitudinal tracking
Personality shaping = prompt-based, fine-tuning

6.4 Self-Improvement is Possible

Across ACL, EMNLP:

Reflection-Reinforced: Self-training via reflection
Autonomous Learning: Self-directed improvement
Self-Referential: Reasoning about self
Agent-FLAN: Effective agent tuning

Consensus: LLMs can improve themselves through reflection and autonomous learning.

Implications:

Self-improvement = personality evolution
Reflection mechanism = SOUL.md updating
Autonomous learning = self-directed growth
Governance needed to prevent drift

7. Emerging Research Directions

7.1 Memory as Identity

Trend: Memory systems becoming identity systems.

Memory stores experiences → defines who agent is
Memory organization → personality structure
Memory retrieval → personality expression
Memory evolution → personality evolution

Research direction: Memory-identity coupling as personality mechanism.

7.2 Hierarchical Multi-Agent Memory

Trend: Multi-agent systems need hierarchical memory.

Individual agent memories
Shared team memories
Collective fleet memories

Research direction: Memory hierarchies for personality emergence at different scales.

7.3 Personality Measurement Standardization

Trend: Moving toward standardized personality assessment.

Big Five as standard framework
NEO-FFI as standard tool
Cross-model comparison possible

Research direction: Standardized personality benchmarks for LLM agents.

7.4 Self-Improvement Governance

Trend: Self-improvement needs governance mechanisms.

Reflection without drift
Autonomous learning with constraints
Self-modification with oversight

Research direction: Governed self-improvement for safe personality evolution.

8. Implications for Fleet Architecture

8.1 Memory System Design

From academic research:

Hierarchical memory: Individual → team → fleet levels
Agentic organization: Agents control memory structure
Dynamic memory: Memory evolves with experience
Memory security: Protect against poisoning

Recommendations:

Implement hierarchical memory (individual + shared + collective)
Enable agent-controlled memory organization
Design dynamic memory evolution
Implement memory security measures

8.2 Multi-Agent Coordination

From academic research:

Scaling laws: More agents → emergent capabilities
Convention formation: Pre-coordination agreements
Evolving orchestration: Dynamic coordination
Specialization: Different agents for different tasks

Recommendations:

Design for scale (7+ agents in fleet)
Implement convention formation protocols
Enable evolving orchestration mechanisms
Define specialization for each agent

8.3 Personality Measurement

From academic research:

Big Five: Standard personality framework
Longitudinal tracking: Stability over time
Stress testing: Personality under pressure
Psychometric validation: Reliable measurement

Recommendations:

Use Big Five framework for personality assessment
Implement longitudinal tracking (regular assessments)
Design stress tests for personality under pressure
Validate measurement tools for reliability

8.4 Self-Improvement Systems

From academic research:

Reflection mechanisms: Self-evaluation and improvement
Autonomous learning: Self-directed knowledge acquisition
Governance: Constraints on self-modification
Safety: Prevent harmful drift

Recommendations:

Implement reflection mechanisms (self-evaluation)
Enable autonomous learning with constraints
Design governance gates for self-modification
Implement safety measures against drift

9. Key Papers by Topic

Memory Systems

A-Mem (NeurIPS 2025) - Agentic memory organization
G-Memory (NeurIPS 2025) - Hierarchical multi-agent memory
CAM (NeurIPS 2025) - Constructivist memory
VLM Agents (NeurIPS 2024) - Self-generated memory
MemAgents Workshop (ICLR 2026) - Memory layer focus

Multi-Agent Coordination

AgentVerse (ICLR 2024) - Emergent behaviors
Scaling Multi-Agent (ICLR 2025) - Scaling laws
MegaAgent (ACL 2025) - Large-scale systems
Emergent Coordination (arXiv) - Information-theoretic emergence
Evolving Orchestration (ICLR 2025) - Dynamic coordination

Personality Measurement

NEO-FFI Studies (NeurIPS 2025) - Big Five in LLMs
Psychometric Framework (Nature MI 2025) - Validated measurement
Humanizing LLMs (arXiv 2025) - Survey of psychological tools
RoleAgent (NeurIPS 2024) - Profile initialization
Dynamic Personality (ACL 2025) - Trait stability

Self-Improvement

Reflection-Reinforced (EMNLP 2024) - Self-training via reflection
Autonomous Learning (ACL 2025) - Self-directed improvement
Self-Referential Agent (ACL 2025) - Reasoning about self
Agent-FLAN (ACL 2024) - Effective agent tuning

10. Next Steps

Phase 1.8: Phase 1 Synthesis

Cross-area patterns
Key findings integration
Identify highest-impact areas for Phase 2 depth dives

Phase 1.7 complete. Moving to Phase 1 synthesis…