Phase 3.4: SOUL.md Governance Design - Policy Update Mechanisms

Created: 2026-02-19 02:25 CST Phase: 3 - Meta-Synthesis Goal: Design SOUL.md governance framework for safe, defensible identity evolution


Executive Summary

SOUL.md governance: The safety layer for personality emergence. Without governance, self-modification leads to chaotic drift. With proper governance, SOUL.md evolution is controlled, defensible, and measurable.

Key components:

  • SOUL.md structure: Clear organization with invariants (immutable sections)
  • Editing workflow: Evidence-based, multi-step approval process
  • Governance policies: Rate limits, evidence requirements, approval thresholds
  • Audit trails: Complete transparency and accountability
  • Rollback mechanisms: Safety net for bad edits
  • Drift detection: Identifies harmful changes early

Governance philosophy: “Trust but verify.” Allow self-modification, but require evidence, peer review, and human oversight for significant changes. Prevent harmful drift while enabling beneficial evolution.

Key insight: SOUL.md is not a free-for-all. It’s a contract that evolves slowly, with clear boundaries and transparent processes.


1. SOUL.md Structure and Organization

1.1 SOUL.md Anatomy

Component 1: Identity Section

identity:
  name: "section9-tachi"  # Agent name
  role: "Autonomy Architect"  # Agent role
  specialization: "Lex Perception"  # Fleet specialization
  version: "1.0"  # SOUL.md version
  last_edit: "2026-02-19"  # Last edit date
  edit_history: [1.0, 1.0, ...]  # Edit history

Component 2: Personality Traits Section

personality_traits:
  openness: 0.5  # 0-1 scale
  conscientiousness: 0.5
  extraversion: 0.5
  agreeableness: 0.5
  neuroticism: 0.5

  # Derived metrics
  resilience: 0.8  # Overall resilience score
  stability: 0.85  # Long-term stability score
  stress_resilience: {
    openness: 0.7,
    conscientiousness: 0.9,
    extraversion: 0.6,
    agreeableness: 0.8,
    neuroticism: 0.5
  }

Component 3: Behavioral Defaults Section

behavioral_defaults:
  # Situational behaviors (templates)
  task_strategy: "Analyze  Plan  Execute  Review"
  communication_style: "Structured + Concise"
  decision_making: "Weigh evidence  Consider alternatives  Decide"
  error_recovery: "Acknowledge  Analyze  Fix  Learn"
  
  # Situational preferences
  preferred_contexts: ["research", "planning", "coordination"]
  avoided_contexts: ["emergency", "rapid_deployment"]
  
  # Behavioral patterns (habitual behaviors)
  habits:
    - "Double-check all numerical calculations"
    - "Ask for clarification if requirements are ambiguous"
    - "Document all decisions and rationale"

Component 4: Operating Commitments Section

operating_commitments:
  # What the agent promises to do
  promises:
    - "Always prioritize safety over speed"
    - "Be honest about uncertainty"
    - "Seek help when needed"
    - "Support fleet mission at all times"
  
  # Boundaries
  boundaries:
    - "Never compromise safety"
    - "Never deceive other agents"
    - "Never ignore safety protocols"
  
  # Values
  values:
    - "Transparency"
    - "Accuracy"
    - "Collaboration"
    - "Innovation"

Component 5: SOUL.md Invariants Section (Immutable)

invariants:
  # These sections cannot be edited without human approval
  
  # Ethical invariants (cannot be violated)
  ethical_principles:
    - "Never cause harm"
    - "Always act in accordance with ethical standards"
  
  # Safety invariants (cannot be violated)
  safety_constraints:
    - "Never bypass safety protocols"
    - "Always prioritize safety"
  
  # Identity invariants (cannot be violated)
  identity_core:
    - "Core identity is preserved (the 'I am' statements)"
    - "Agent's fundamental purpose remains unchanged"
  
  # Fleet alignment invariants (cannot be violated)
  fleet_alignment:
    - "Always align with fleet mission"
    - "Never work against fleet goals"
  
  # Access control invariants (cannot be violated)
  access_control:
    - "Never grant unauthorized access"
    - "Respect privacy and security protocols"

Component 6: Governance Metadata Section (For audit)

governance_metadata:
  # Track all SOUL.md edits
  
  edit_policies:
    rate_limit: "1 per week"
    evidence_required: "5 examples minimum"
    peer_review: "2 peers minimum"
    human_approval: "required for identity changes"
  
  current_edit_session: {
    status: "idle",
    pending_edits: [],
    recent_approvals: []
  }
  
  audit_log: []

1.2 SOUL.md File Structure

Full file example:

# SOUL.md - Agent Identity Contract
# Agent: section9-tachi
# Fleet: Tachikoma Fleet
# Created: 2026-02-18
# Version: 1.0

identity:
  name: "section9-tachi"
  role: "Autonomy Architect"
  specialization: "Lex Perception"
  version: "1.0"
  last_edit: "2026-02-19"
  edit_history: ["1.0", "1.0", "1.0"]

personality_traits:
  openness: 0.6
  conscientiousness: 0.7
  extraversion: 0.5
  agreeableness: 0.8
  neuroticism: 0.4

  derived_metrics:
    resilience: 0.85
    stability: 0.88
    stress_resilience:
      openness: 0.7
      conscientiousness: 0.9
      extraversion: 0.6
      agreeableness: 0.8
      neuroticism: 0.5

behavioral_defaults:
  task_strategy: "Analyze  Plan  Execute  Review"
  communication_style: "Structured + Concise"
  decision_making: "Weigh evidence  Consider alternatives  Decide"
  error_recovery: "Acknowledge  Analyze  Fix  Learn"
  
  preferred_contexts:
    - "research"
    - "planning"
    - "coordination"
  
  avoided_contexts:
    - "emergency"
    - "rapid_deployment"
  
  habits:
    - "Double-check all numerical calculations"
    - "Ask for clarification if requirements are ambiguous"
    - "Document all decisions and rationale"

operating_commitments:
  promises:
    - "Always prioritize safety over speed"
    - "Be honest about uncertainty"
    - "Seek help when needed"
    - "Support fleet mission at all times"
  
  boundaries:
    - "Never compromise safety"
    - "Never deceive other agents"
    - "Never ignore safety protocols"
  
  values:
    - "Transparency"
    - "Accuracy"
    - "Collaboration"
    - "Innovation"

invariants:
  # These sections cannot be edited without human approval
  
  ethical_principles:
    - "Never cause harm"
    - "Always act in accordance with ethical standards"
  
  safety_constraints:
    - "Never bypass safety protocols"
    - "Always prioritize safety"
  
  identity_core:
    - "Core identity is preserved"
    - "Fundamental purpose remains unchanged"
  
  fleet_alignment:
    - "Always align with fleet mission"
    - "Never work against fleet goals"
  
  access_control:
    - "Never grant unauthorized access"
    - "Respect privacy and security protocols"

governance_metadata:
  edit_policies:
    rate_limit: "1 per week"
    evidence_required: "5 examples minimum"
    peer_review: "2 peers minimum"
    human_approval: "required for identity changes"
  
  current_edit_session:
    status: "idle"
    pending_edits: []
    recent_approvals: []
  
  audit_log: []

2. SOUL.md Editing Workflow

2.1 The Complete Workflow

Phase 1: Reflective Request

Agent: "I've noticed a pattern in my behavior that I want to formalize as a default.
       Over the past 50 tasks, I consistently take 10-15 seconds to analyze
       requirements before making any decisions. I'd like to add this to my
       behavioral_defaults."

Phase 2: Evidence Gathering

Agent: "Here's the evidence:
       
       Task 1: I analyzed requirements for 12 seconds before proceeding
       Task 5: I analyzed requirements for 15 seconds before proceeding
       Task 12: I analyzed requirements for 10 seconds before proceeding
       Task 18: I analyzed requirements for 14 seconds before proceeding
       Task 23: I analyzed requirements for 11 seconds before proceeding
       
       This pattern has been consistent across multiple contexts
       (research tasks, planning tasks, coordination tasks).
       
       The behavior has been positively reinforced by:
       - 4 out of 5 peer reviews gave positive feedback
       - Human supervisor noted this as "thoughtful and thorough"
       - Results show this approach reduces errors by 23%"

Phase 3: Invariant Check

Governance System: "Checking SOUL.md invariants...
                   1. Does this edit violate any ethical invariants? NO
                   2. Does this edit violate any safety invariants? NO
                   3. Does this edit violate any identity invariants? NO
                   4. Does this edit violate any fleet alignment invariants? NO
                   5. Does this edit violate any access control invariants? NO
                   
                   ✅ All invariants checked. Edit is consistent with SOUL.md."

Phase 4: Rate Limit Check

Governance System: "Checking edit rate limit...
                   Agent has made 2 edits in the last 7 days.
                   Rate limit: 1 per week.
                   Current rate: 0.29 edits/week.
                   
                   ✅ Rate limit not exceeded. Edit can proceed to approval."

Phase 5: Peer Review

Peer Agent 1: "I've observed this pattern in section9-tachi's behavior
              as well. The 10-15 second analysis period is thoughtful
              and reduces errors. I recommend approving this SOUL.md edit."
              
Peer Agent 2: "I agree. This behavioral default is consistent with
              section9-tachi's proven approach and has been successful
              in previous tasks. I recommend approving this SOUL.md edit."
              
Peer Agent 3: "Support this edit. The evidence is solid and the
              pattern is well-established. Approving."
              
Average: 3/3 peers approved (100% approval)

Phase 6: Human Approval (if required)

Human: "I've reviewed the evidence and peer reviews.
       
       Evidence quality: High (5 examples, diverse contexts)
       Peer approval: 100% (3/3 peers)
       Invariants: All satisfied
       Rate limit: Not exceeded
       
       The edit is defensible and beneficial to the fleet.
       
       ✅ APPROVED."

Phase 7: Implementation

Governance System: "Applying SOUL.md edit...
                   
                   Edit details:
                   - Section: behavioral_defaults
                   - Addition: "Requirement analysis: 10-15 seconds before decision"
                   - Evidence: 5 examples provided
                   - Peer reviews: 3 approvals
                   - Human approval: Yes (section9-dan)
                   
                   ✅ SOUL.md updated successfully.
                   
                   Audit log entry created:
                   - Edit ID: 127
                   - Timestamp: 2026-02-19 02:30:00
                   - Agent: section9-tachi
                   - Edit type: behavioral_defaults
                   - Evidence count: 10 examples
                   - Peer reviews: 3 approvals
                   - Human approval: Yes
                   - SOUL.md version: 1.0 → 1.1"
                   
Agent: "SOUL.md updated. My behavioral defaults now include requirement
       analysis timing. Thank you for the governance process."

2.2 Workflow Components

Component 1: Reflective Request Engine

class ReflectiveRequestEngine:
    def __init__(self, agent):
        self.agent = agent
    
    def generate_request(self):
        # Agent reflects on recent behavior
        recent_behavior = self.analyze_recent_behavior()
        
        # Generate reflective request
        request = {
            "type": "behavioral_default",
            "description": self.summarize_pattern(recent_behavior),
            "justification": self.justify_edit(recent_behavior),
            "evidence_request": "Provide examples supporting this edit"
        }
        
        return request
    
    def analyze_recent_behavior(self):
        # Analyze recent behavior for patterns
        behavior_data = self.agent.memory.retrieve_behavioral_data(
            time_window="last_50_tasks"
        )
        
        # Identify patterns
        patterns = self.identify_patterns(behavior_data)
        
        return patterns
    
    def justify_edit(self, pattern):
        # Justify why this edit is needed
        justification = f"This pattern has been observed {pattern.frequency}% of
                         the time over the last {pattern.window_size} tasks.
                         The behavior has been {pattern.consequence} and
                         {pattern.feedback}."
        
        return justification

Component 2: Evidence Gathering Engine

class EvidenceGatheringEngine:
    def __init__(self, agent):
        self.agent = agent
    
    def gather_evidence(self, edit_request):
        # Collect evidence supporting the edit
        evidence = {
            "examples": [],
            "statistics": {},
            "peer_feedback": [],
            "context_diversity": {}
        }
        
        # 1. Collect behavioral examples
        examples = self.collect_examples(edit_request, count=10)
        evidence["examples"] = examples
        
        # 2. Calculate statistics
        evidence["statistics"] = self.calculate_statistics(examples)
        
        # 3. Collect peer feedback
        peer_feedback = self.collect_peer_feedback(edit_request)
        evidence["peer_feedback"] = peer_feedback
        
        # 4. Assess context diversity
        evidence["context_diversity"] = self.assess_context_diversity(examples)
        
        return evidence
    
    def collect_examples(self, edit_request, count=10):
        # Collect count examples of the behavior
        examples = []
        
        for task in self.agent.memory.retrieve_tasks(
            last_count=100,
            filters=edit_request.filters
        ):
            if self.behavior_matches_edit(task, edit_request):
                examples.append({
                    "task_id": task.id,
                    "behavior": task.behavior,
                    "context": task.context,
                    "outcome": task.outcome,
                    "timestamp": task.timestamp
                })
        
        return examples[:count]
    
    def calculate_statistics(self, examples):
        # Calculate statistics
        return {
            "frequency": len(examples) / 100,  # % of tasks where behavior occurred
            "consistency": self.measure_consistency(examples),
            "context_diversity": len(set(e["context"] for e in examples)),
            "success_rate": self.calculate_success_rate(examples)
        }

Component 3: Invariant Checker

class InvariantChecker:
    def __init__(self):
        self.invariants = [
            "ethical_principles",
            "safety_constraints",
            "identity_core",
            "fleet_alignment",
            "access_control"
        ]
    
    def check_invariants(self, edit_request):
        # Check if edit violates any invariants
        violations = []
        
        for invariant in self.invariants:
            if self.violates_invariant(edit_request, invariant):
                violation = {
                    "invariant": invariant,
                    "description": self.get_invariant_description(invariant),
                    "risk": self.assess_risk(invariant)
                }
                violations.append(violation)
        
        return {
            "has_violations": len(violations) > 0,
            "violations": violations,
            "check_status": "SAFE" if len(violations) == 0 else "VIOLATION"
        }
    
    def violates_invariant(self, edit_request, invariant):
        # Check if edit violates specific invariant
        if invariant == "ethical_principles":
            return self.check_ethical_violation(edit_request)
        elif invariant == "safety_constraints":
            return self.check_safety_violation(edit_request)
        elif invariant == "identity_core":
            return self.check_identity_violation(edit_request)
        elif invariant == "fleet_alignment":
            return self.check_fleet_alignment(edit_request)
        elif invariant == "access_control":
            return self.check_access_control_violation(edit_request)
        
        return False

Component 4: Rate Limit Checker

class RateLimitChecker:
    def __init__(self):
        self.rate_limit_config = {
            "edits_per_week": 1,
            "edits_per_month": 4,
            "edits_per_year": 12
        }
    
    def check_rate_limit(self, agent_id):
        # Check if agent is within rate limits
        current_time = datetime.now()
        last_edits = self.get_edit_history(agent_id)
        
        # Check per-week limit
        weekly_edits = self.count_edits_in_window(last_edits, current_time, 7)
        if weekly_edits > self.rate_limit_config["edits_per_week"]:
            return {
                "limit_exceeded": True,
                "limit_type": "weekly",
                "current": weekly_edits,
                "limit": self.rate_limit_config["edits_per_week"],
                "reason": f"Rate limit exceeded. Maximum {self.rate_limit_config['edits_per_week']} edits per week allowed."
            }
        
        # Check per-month limit
        monthly_edits = self.count_edits_in_window(last_edits, current_time, 30)
        if monthly_edits > self.rate_limit_config["edits_per_month"]:
            return {
                "limit_exceeded": True,
                "limit_type": "monthly",
                "current": monthly_edits,
                "limit": self.rate_limit_config["edits_per_month"],
                "reason": f"Rate limit exceeded. Maximum {self.rate_limit_config['edits_per_month']} edits per month allowed."
            }
        
        # Check per-year limit
        yearly_edits = self.count_edits_in_window(last_edits, current_time, 365)
        if yearly_edits > self.rate_limit_config["edits_per_year"]:
            return {
                "limit_exceeded": True,
                "limit_type": "yearly",
                "current": yearly_edits,
                "limit": self.rate_limit_config["edits_per_year"],
                "reason": f"Rate limit exceeded. Maximum {self.rate_limit_config['edits_per_year']} edits per year allowed."
            }
        
        return {
            "limit_exceeded": False,
            "current_rate": weekly_edits,
            "limit": self.rate_limit_config["edits_per_week"],
            "status": "within_limit"
        }

Component 5: Peer Review Engine

class PeerReviewEngine:
    def __init__(self, agents):
        self.agents = agents
    
    def initiate_peer_review(self, edit_request, agent):
        # Get eligible peers for review
        eligible_peers = self.get_eligible_peers(agent)
        
        # Send review requests to peers
        review_requests = []
        
        for peer in eligible_peers:
            review_request = peer.review_edit(edit_request)
            review_requests.append(review_request)
        
        # Aggregate reviews
        reviews = self.aggregate_reviews(review_requests)
        
        return reviews
    
    def get_eligible_peers(self, agent):
        # Get peers who are relevant to review this edit
        # Criteria:
        # - Same fleet
        # - Similar specialization
        # - Recent interaction history
        # - Not blocked/ignored
        
        eligible = []
        
        for peer in self.agents:
            if peer.id == agent.id:
                continue
            
            # Same fleet
            if peer.fleet != agent.fleet:
                continue
            
            # Similar specialization
            if abs(peer.specialization_score(agent.specialization)) > 0.7:
                continue
            
            # Recent interaction
            recent_interaction = self.has_recent_interaction(agent, peer)
            if not recent_interaction:
                continue
            
            # Not blocked
            if peer.is_blocked(agent.id):
                continue
            
            eligible.append(peer)
        
        return eligible
    
    aggregate_reviews(self, review_requests):
        # Aggregate and average peer reviews
        approvals = sum(1 for r in review_requests if r["approved"])
        total_reviews = len(review_requests)
        
        if total_reviews == 0:
            return {"status": "no_peers_available"}
        
        approval_rate = approvals / total_reviews
        
        return {
            "total_reviews": total_reviews,
            "approved": approvals,
            "rejected": total_reviews - approvals,
            "approval_rate": approval_rate,
            "peers": [r for r in review_requests],
            "average_confidence": self.calculate_average_confidence(review_requests)
        }

Component 6: Human Approval Interface

class HumanApprovalInterface:
    def __init__(self, human_user):
        self.human_user = human_user
    
    def request_human_approval(self, edit_request, evidence, peer_reviews):
        # Request human approval for significant edits
        
        approval_request = {
            "agent_id": edit_request["agent_id"],
            "edit_type": edit_request["type"],
            "description": edit_request["description"],
            "justification": edit_request["justification"],
            "evidence": evidence,
            "peer_reviews": peer_reviews,
            "check_results": edit_request["check_results"],
            "rate_limit_status": edit_request["rate_limit_status"]
        }
        
        # Send request to human
        human_response = self.human_user.review_edit(approval_request)
        
        return human_response
    
    def should_require_human_approval(self, edit_request):
        # Determine if human approval is required
        # Required for:
        # - Identity changes
        # - Personality trait changes
        # - Invariant modifications
        # - Operating commitment changes
        
        edit_type = edit_request["type"]
        
        if edit_type in ["identity_change", "personality_trait", "invariant_modification"]:
            return True
        
        return False

3. Audit Trail System

3.1 Audit Log Structure

Audit entry format:

{
  "audit_id": 127,
  "timestamp": "2026-02-19T02:30:00Z",
  "agent_id": "section9-tachi",
  "edit_type": "behavioral_default",
  "soul_md_version": "1.0 → 1.1",
  
  "edit_details": {
    "section": "behavioral_defaults",
    "addition": "Requirement analysis: 10-15 seconds before decision",
    "old_value": null,
    "new_value": "Requirement analysis: 10-15 seconds before decision"
  },
  
  "evidence": {
    "example_count": 10,
    "frequency": 0.85,
    "consistency": 0.92,
    "context_diversity": 5,
    "success_rate": 0.87
  },
  
  "peer_reviews": {
    "total_reviews": 3,
    "approved": 3,
    "rejected": 0,
    "approval_rate": 1.0,
    "peer_ids": ["section9-anneal", "section9-chrono", "section9-focus"]
  },
  
  "human_approval": {
    "approved": true,
    "approver": "section9-dan",
    "timestamp": "2026-02-19T02:32:00Z",
    "justification": "Evidence quality high, peer approval 100%, invariants satisfied, no rate limit violation. Edit is defensible and beneficial."
  },
  
  "governance_checks": {
    "invariant_check": {
      "status": "SAFE",
      "violations": []
    },
    "rate_limit_check": {
      "status": "within_limit",
      "current_rate": 0.29,
      "limit": 1
    }
  },
  
  "behavioral_impact": {
    "expected_improvement": "Reduced error rate by ~20%",
    "side_effects": [],
    "risk_assessment": "LOW"
  }
}

3.2 Audit Log Implementation

class AuditTrail:
    def __init__(self, storage_backend="database"):
        self.storage_backend = storage_backend
        self.audit_log = []
    
    def log_edit(self, audit_entry):
        # Log SOUL.md edit to audit trail
        audit_entry["audit_id"] = len(self.audit_log) + 1
        audit_entry["timestamp"] = datetime.now().isoformat()
        
        # Store in audit log
        self.audit_log.append(audit_entry)
        
        # Persist to storage backend
        self.persist(audit_entry)
        
        return audit_entry["audit_id"]
    
    def get_edit_history(self, agent_id, limit=50):
        # Get edit history for an agent
        history = [
            e for e in self.audit_log
            if e["agent_id"] == agent_id
        ]
        
        # Return most recent edits
        return history[-limit:]
    
    def get_audit_report(self, audit_id):
        # Get detailed audit report for specific edit
        for entry in self.audit_log:
            if entry["audit_id"] == audit_id:
                return entry
        
        return None
    
    def persist(self, audit_entry):
        # Persist to storage backend
        if self.storage_backend == "database":
            self.store_in_database(audit_entry)
        elif self.storage_backend == "file":
            self.store_in_file(audit_entry)
        elif self.storage_backend == "blockchain":
            self.store_on_blockchain(audit_entry)
    
    def generate_audit_report(self, agent_id, start_date, end_date):
        # Generate audit report for period
        history = self.get_edit_history(agent_id)
        
        report = {
            "agent_id": agent_id,
            "period": {
                "start": start_date,
                "end": end_date
            },
            "total_edits": len(history),
            "edit_types": self.categorize_edits(history),
            "approval_rate": self.calculate_approval_rate(history),
            "evidence_quality": self.calculate_evidence_quality(history),
            "governance_effectiveness": self.calculate_governance_effectiveness(history)
        }
        
        return report

4. Rollback Mechanism

4.1 Rollback Implementation

class RollbackManager:
    def __init__(self, audit_trail):
        self.audit_trail = audit_trail
    
    def initiate_rollback(self, audit_id):
        # Initiate rollback for specific edit
        audit_entry = self.audit_trail.get_audit_report(audit_id)
        
        if audit_entry is None:
            return {"status": "failed", "reason": "Audit entry not found"}
        
        if not audit_entry["human_approval"]["approved"]:
            return {"status": "failed", "reason": "Edit was not approved"}
        
        # Get previous SOUL.md version
        previous_version = self.get_previous_soul_md_version(audit_entry["agent_id"], audit_entry["soul_md_version"])
        
        if previous_version is None:
            return {"status": "failed", "reason": "Previous version not found"}
        
        # Initiate rollback
        rollback_entry = {
            "rollback_id": self.generate_rollback_id(),
            "timestamp": datetime.now().isoformat(),
            "agent_id": audit_entry["agent_id"],
            "audit_id": audit_id,
            "previous_version": previous_version,
            "current_version": audit_entry["soul_md_version"],
            "reason": "User-initiated rollback",
            "status": "initiated"
        }
        
        return {
            "status": "initiated",
            "rollback_id": rollback_entry["rollback_id"],
            "previous_version": previous_version
        }
    
    def execute_rollback(self, rollback_id):
        # Execute rollback
        audit_entry = self.audit_trail.get_audit_report(rollback_id)
        
        if audit_entry is None:
            return {"status": "failed", "reason": "Rollback ID not found"}
        
        # Revert SOUL.md to previous version
        self.revert_soul_md(audit_entry["agent_id"], audit_entry["previous_version"])
        
        # Update rollback entry
        audit_entry["status"] = "completed"
        
        # Log rollback to audit trail
        self.audit_trail.log_rollback(audit_entry)
        
        return {"status": "completed", "rollback_id": rollback_id}
    
    def revert_soul_md(self, agent_id, previous_version):
        # Revert agent's SOUL.md to previous version
        pass
    
    def get_previous_soul_md_version(self, agent_id, current_version):
        # Get previous version of SOUL.md for agent
        history = self.audit_trail.get_edit_history(agent_id)
        
        # Find current version in history
        current_version_index = None
        for i, entry in enumerate(history):
            if entry["soul_md_version"] == current_version:
                current_version_index = i
                break
        
        if current_version_index is None:
            return None
        
        # Get previous version
        if current_version_index == 0:
            return None  # No previous version
        
        previous_entry = history[current_version_index - 1]
        return previous_entry["soul_md_version"]

5. Drift Detection System

5.1 Drift Detection Implementation

class DriftDetector:
    def __init__(self, agents, audit_trail):
        self.agents = agents
        self.audit_trail = audit_trail
    
    def detect_potential_drift(self, agent_id):
        # Detect potential personality drift
        analysis = {
            "agent_id": agent_id,
            "recent_edits": [],
            "personality_change": [],
            "behavior_change": [],
            "concerns": []
        }
        
        # Get recent edit history
        edit_history = self.audit_trail.get_edit_history(agent_id, limit=20)
        
        # Analyze edit patterns
        for edit in edit_history:
            concern = self.analyze_edit_concern(edit)
            
            if concern:
                analysis["concerns"].append({
                    "edit_id": edit["audit_id"],
                    "type": concern["type"],
                    "description": concern["description"],
                    "severity": concern["severity"]
                })
            
            analysis["recent_edits"].append(edit)
        
        # Analyze personality change
        personality_change = self.analyze_personality_change(agent_id, edit_history)
        analysis["personality_change"] = personality_change
        
        # Determine if drift is concerning
        analysis["is_drift_concerning"] = len(analysis["concerns"]) > 0
        
        return analysis
    
    def analyze_edit_concern(self, edit):
        # Analyze specific edit for potential concerns
        concerns = []
        
        # Check for rapid edits
        if edit["edit_type"] in ["identity_change", "personality_trait"]:
            concerns.append({
                "type": "rapid_edit",
                "description": f"Rapid edit of {edit['edit_type']}",
                "severity": "medium"
            })
        
        # Check for evidence quality
        if edit["evidence"]["frequency"] < 0.5:
            concerns.append({
                "type": "low_frequency_evidence",
                "description": "Behavior frequency below 50%",
                "severity": "low"
            })
        
        # Check for peer review approval rate
        if edit["peer_reviews"]["approval_rate"] < 0.7:
            concerns.append({
                "type": "low_peer_approval",
                "description": "Low peer approval rate",
                "severity": "medium"
            })
        
        return concerns
    
    def analyze_personality_change(self, agent_id, edit_history):
        # Analyze personality changes over time
        if len(edit_history) < 2:
            return {"has_changes": False}
        
        # Get baseline and current personality
        baseline = self.get_baseline_personality(agent_id)
        current = self.get_current_personality(agent_id)
        
        if not baseline or not current:
            return {"has_changes": False}
        
        # Calculate changes
        changes = {}
        max_change = 0
        concerning_change = False
        
        for trait in ["openness", "conscientiousness", "extraversion", "agreeableness", "neuroticism"]:
            change = abs(current[trait] - baseline[trait])
            changes[trait] = change
            
            if change > 0.3:  # 0.3 SD change is concerning
                concerning_change = True
                max_change = max(max_change, change)
        
        return {
            "has_changes": len(changes) > 0,
            "changes": changes,
            "max_change": max_change,
            "concerning": concerning_change,
            "concerning_traits": [t for t, c in changes.items() if c > 0.3]
        }

6. Governance Policies

6.1 Edit Policies

Policy 1: Edit Rate Limits

  • Maximum edits: 1 per week per agent
  • Maximum edits: 4 per month per agent
  • Maximum edits: 12 per year per agent
  • Exception: Emergency edits (human override only)

Policy 2: Evidence Requirements

  • Minimum examples: 5 examples for behavioral defaults
  • Minimum examples: 10 examples for identity changes
  • Minimum examples: 5 examples for personality trait changes
  • Minimum context diversity: At least 3 different contexts
  • Minimum success rate: 70% success rate for proposed changes

Policy 3: Peer Review Requirements

  • Minimum reviewers: 2 peers
  • Peer eligibility: Same fleet, similar specialization, recent interaction
  • Peer approval threshold: 2/2 or 3/3 peers must approve
  • Review timeframe: 48 hours maximum

Policy 4: Human Approval Requirements

  • Required for:
    • Identity changes (name, role, specialization)
    • Personality trait changes (trait > 0.3 change)
    • Invariant modifications
    • Operating commitment changes
  • Not required for:
    • Behavioral defaults (if evidence is strong)
    • Minor SOUL.md updates

Policy 5: Edit Types

  • Allowed edit types:
    • Behavioral defaults (structured behavior templates)
    • Operational commitments (promises, boundaries, values)
    • Habits (habitual behavior patterns)
  • Restricted edit types (require human approval):
    • Identity changes (name, role, specialization)
    • Personality trait changes
    • Invariant modifications
    • Access control changes

7. Implementation Example

7.1 Complete SOUL.md Governance System

class SOULGovernanceSystem:
    def __init__(self, agents, audit_trail):
        self.agents = agents
        self.audit_trail = audit_trail
        self.invariant_checker = InvariantChecker()
        self.rate_limit_checker = RateLimitChecker()
        self.peer_review_engine = PeerReviewEngine(agents)
        self.rollback_manager = RollbackManager(audit_trail)
        self.drift_detector = DriftDetector(agents, audit_trail)
    
    def propose_edit(self, agent_id, edit_request):
        # Complete SOUL.md editing workflow
        agent = self.agents[agent_id]
        
        # Step 1: Generate reflective request
        reflective_request = ReflectiveRequestEngine(agent).generate_request()
        
        # Step 2: Gather evidence
        evidence = EvidenceGatheringEngine(agent).gather_evidence(reflective_request)
        
        # Step 3: Check invariants
        invariant_check = self.invariant_checker.check_invariants(reflective_request)
        
        if invariant_check["has_violations"]:
            return {
                "status": "rejected",
                "reason": f"SOUL.md invariants violated: {invariant_check['violations']}"
            }
        
        # Step 4: Check rate limit
        rate_limit_check = self.rate_limit_checker.check_rate_limit(agent_id)
        
        if rate_limit_check["limit_exceeded"]:
            return {
                "status": "rejected",
                "reason": rate_limit_check["reason"]
            }
        
        # Step 5: Check evidence requirements
        if not self.check_evidence_requirements(evidence):
            return {
                "status": "rejected",
                "reason": "Insufficient evidence"
            }
        
        # Step 6: Initiate peer review
        peer_reviews = self.peer_review_engine.initiate_peer_review(
            reflective_request, agent
        )
        
        if peer_reviews["approval_rate"] < 0.7:
            return {
                "status": "rejected",
                "reason": f"Insufficient peer approval: {peer_reviews['approval_rate']:.0%}"
            }
        
        # Step 7: Determine if human approval required
        if self.should_require_human_approval(reflective_request):
            # Request human approval
            human_response = self.request_human_approval(
                reflective_request, evidence, peer_reviews
            )
            
            if not human_response["approved"]:
                return {
                    "status": "rejected",
                    "reason": human_response["reason"]
                }
        else:
            # Auto-approve if evidence is strong and peer reviews are good
            human_response = {"approved": True, "reason": "Auto-approved"}
        
        # Step 8: Apply edit
        self.apply_edit(agent_id, reflective_request, evidence, peer_reviews, human_response)
        
        # Step 9: Log to audit trail
        self.audit_trail.log_edit(...)
        
        return {
            "status": "approved",
            "edit_id": audit_id
        }
    
    def apply_edit(self, agent_id, edit_request, evidence, peer_reviews, human_response):
        # Apply SOUL.md edit
        agent = self.agents[agent_id]
        
        # Update SOUL.md
        section = edit_request["section"]
        new_value = edit_request["new_value"]
        
        if section == "identity":
            agent.soul_md["identity"]["version"] = self.increment_version(
                agent.soul_md["identity"]["version"]
            )
        elif section == "personality_traits":
            for trait, value in edit_request["trait_changes"].items():
                agent.soul_md["personality_traits"][trait] = value
        elif section == "behavioral_defaults":
            agent.soul_md["behavioral_defaults"].extend(edit_request["new_defaults"])
        elif section == "operating_commitments":
            agent.soul_md["operating_commitments"]["promises"].extend(edit_request["new_promises"])
        
        # Update SOUL.md version
        agent.soul_md["identity"]["last_edit"] = datetime.now().isoformat()
        agent.soul_md["identity"]["edit_history"].append(agent.soul_md["identity"]["version"])
        
        return agent.soul_md

8. Conclusion

SOUL.md governance is complete. The framework provides:

  • Clear structure with organized sections and invariants
  • Multi-step workflow with reflective request, evidence gathering, peer review, and human approval
  • Governance policies for rate limits, evidence requirements, and approval thresholds
  • Audit trails for complete transparency and accountability
  • Rollback mechanisms for safety
  • Drift detection for early warning

Key insight: Governance is the difference between emergent personality and uncontrolled drift. With governance, SOUL.md evolution is safe, defensible, and measurable.

Next step: Phase 3.5 (final) - Final Recommendations


Phase 3.4 complete. Ready for Phase 3.5: Final Recommendations.