Posts

Can AI Personas Actually Make Unsafe Models Safer? Our Experiment Says: It Depends

We tested whether structured persona files can restore safety in abliterated LLMs — models where safety guardrails have been surgically removed. The results reveal a striking asymmetry that challenges conventional thinking about AI safety.

Everything Claude Code Experts Recommend, We Already Built Into SoulClaw

The Community Is Discovering What We Already Know Two recent videos are making the rounds in the AI coding community. One breaks down CLAUDE.md best practices — how to write the context file that shapes Claude Code’s behavior. The other shares 10 tips from an Anthropic hackathon winner on getting 10x productivity from Claude Code. Both are excellent resources. And watching them, I couldn’t help but notice: nearly every recommendation maps directly to something we’ve already built into SoulClaw. ...

Paper: The Forgetting Problem — Why Perfect Memory Breaks AI Agent Identity

New Paper: The Forgetting Problem We’ve published a new preprint exploring a counterintuitive idea: the better an AI agent’s memory, the worse its identity becomes. 📄 Read the paper on Zenodo (CC-BY 4.0, open access) The Memory-Identity Paradox Every major AI agent framework is racing to build better memory. MemGPT, Mem0, A-Mem, MemoryBank — all optimize for remembering more, longer, more accurately. But we identified a fundamental tension: The more faithfully an agent remembers its experiences, the more vulnerable its intended identity becomes to experiential contamination. ...

Perfect Memory Is Breaking Your AI Agent's Identity

Your AI Agent Remembers Everything. That’s the Problem. Every agent framework is racing to build better memory. MemGPT, Mem0, A-Mem — they all want your agent to remember more, longer, better. But here’s a question nobody’s asking: what happens to your agent’s personality when it remembers too much? Humans Forget for a Reason In psychology, there’s a concept called adaptive forgetting. Your brain doesn’t just lose information by accident — it actively suppresses memories that would interfere with your ability to function. ...

Soul Memory: A 4-Tier Adaptive Memory Architecture for AI Agents

The Problem: Your Agent Either Remembers Everything or Nothing Every AI agent developer faces the same dilemma: No memory → Your agent forgets everything between sessions. Every conversation starts from zero. Full memory → Your agent remembers everything with perfect fidelity. Including that one time a user was hostile. Including outdated decisions. Including noise from 6 months ago that drowns out yesterday’s critical update. Neither is right. Humans solved this millions of years ago: we remember what matters and forget what doesn’t. Not perfectly — but well enough to maintain a coherent identity while adapting to new experiences. ...

Soul Spec + MaatSpec: Identity and Governance as Complementary Layers for AI Agents

The Missing Half of Every AI Agent Here’s a question that keeps coming up as AI agents get more autonomous: Who decides what an agent can do — and who decides who the agent is? These sound like the same question. They’re not. Consider a financial advisor agent. It needs to know it’s a conservative, compliance-first advisor (identity). But it also needs hard limits on what actions it can take — it shouldn’t wire money without human approval, regardless of how confident its persona makes it feel (governance). ...

The Human in the Loop of Identity

The Question We’ve Been Circling Over the past three posts, we’ve explored a technical problem: Perfect memory breaks agent identity — accumulated experience corrupts persona Soul Memory provides a practical solution — tiered architecture with strategic forgetting Perfect memory without drift is architecturally impossible — Transformers can’t separate identity from experience But underneath all the architecture diagrams and decay functions, there’s a deeper question we haven’t addressed: Who decides who an AI agent is? ...

When AI Agents Have Wallets: Why Identity Becomes a Security Problem

Stripe Just Made Agent Payments Real On March 19, 2026, Stripe and Tempo jointly announced the Machine Payments Protocol (MPP) — an open protocol for agent-to-agent payments. The code is straightforward: payment = stripe.PaymentIntent.create( amount=1000, currency="usd", payment_method_types=["crypto"], networks=["tempo"] ) An AI agent can now create a payment intent, authorize a transaction, and transfer funds — all through API calls. No human in the loop required. This changes everything about how we think about AI agent identity. ...

Why a Perfect-Memory AI Agent Without Persona Drift is Architecturally Impossible

The Dream: An Agent That Remembers Everything and Never Changes Every AI agent developer has the same fantasy: an agent with perfect memory — one that remembers every conversation, every decision, every preference — while maintaining a rock-solid personality. It never forgets. It never drifts. This isn’t an engineering problem we haven’t solved yet. It’s architecturally impossible with current Transformer-based models. And understanding why changes how you should design agent memory systems. ...

NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

“Write a metaphor about time.” Ask 25 different language models this question. Sample 50 responses from each. What do you get? 1,250 responses that collapse into exactly two metaphors: “time is a river” and “time is a weaver.” That’s it. GPT-4o, Claude, Llama, Qwen, Mixtral, DeepSeek — models built by different companies, trained on different data, with different architectures — all converging on the same two ideas. This isn’t a toy example. It’s a finding from Artificial Hivemind, a paper accepted as an oral presentation at NeurIPS 2025 by researchers from the University of Washington, CMU, Stanford, and AI2. ...