Posts

The Overconfidence Effect: Why Summarized Memory Makes AI Agents Worse

Here’s a result we didn’t expect: an AI agent with carefully curated synthetic memory performed worse than one with no memory at all. Not slightly worse. Significantly worse. 2.65 vs 3.30 out of 5.0. We call it the “overconfidence effect” — and it might change how you think about giving context to AI agents. The Setup Earlier today we shared our preprint on experiential vs synthetic memory in AI agents. We then ran the actual experiment and published the results as v2 of the paper on Zenodo. ...

We Published a Paper on AI Agent Memory — And It Changes How We Think About Agent Onboarding

Today we’re sharing a preprint that’s been months in the making: “Experiential vs Synthetic Memory in Long-Running AI Agents” — now available on Zenodo (DOI: 10.5281/zenodo.18798227). The core question is deceptively simple: Does an AI agent that accumulates real project experience outperform one given equivalent synthetic knowledge? The answer, it turns out, is more nuanced than “yes” — and the implications could reshape how we think about onboarding AI agents onto real-world projects. ...

World's First End-to-End Encrypted Memory Sync for AI Agents

AI agents accumulate deeply personal memory — preferences, habits, work context. Yet no platform encrypts it. ClawSouls introduces the first E2E encrypted memory sync for AI agents, using age (X25519) encryption with a zero-knowledge architecture.

Vibe Founding: When Your AI Partner Ships Faster Than Your Team

Andrej Karpathy coined the term “vibe coding” — that flow state where you describe what you want and an AI writes the code. You don’t scrutinize every line. You vibe. You iterate. You ship. It’s a great term. But it describes maybe 20% of what it takes to build a company. What about the other 80%? Code Is the Easy Part Here’s a dirty secret about startups: writing code was never the hardest part. The hardest part is everything around the code. The docs nobody wants to write. The trademark filings you keep postponing. The blog post that’s been “almost done” for three weeks. The npm package that needs a README, a LICENSE, a proper package.json, CI/CD, and a changelog. The design system. The domain registrations. The social media accounts. The contributor guidelines. The security scanner you promised in your roadmap. ...

90% of AI Models Fail a One-Step Logic Test — Context Fixes It

The Car Wash Test Opper tested 53 AI models with a dead-simple question: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” The answer is obvious: drive. The car needs to be at the car wash. Out of 53 models, 42 said walk. Only 5 could answer correctly 10 out of 10 times. The failures weren’t random. Nearly every wrong answer said the same thing: “50 meters is short, walking saves fuel, better for the environment.” Correct reasoning about the wrong problem. The models fixated on distance and missed the actual constraint — the car itself needs to get there. ...

New Study Says AGENTS.md Makes AI Worse — But There's a Catch

The Headline That Scared the AI Community A new paper from ETH Zurich just dropped a bomb: AGENTS.md files make coding agents worse. “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” by Gloaguen, Mündler, Müller, Raychev, and Vechev tested whether context files actually help AI coding agents complete real-world tasks. Their findings: Task success rates dropped when context files were provided Inference costs increased by over 20% Both LLM-generated and developer-written files caused problems Agents followed the instructions faithfully — but the instructions made them worse The conclusion? Context files introduce “unnecessary requirements” that make tasks harder. The recommendation: describe only minimal requirements. ...

Can You Use a Robot Soul in ChatGPT?

Soul Spec v0.5 added robotics extensions — fields like sensors, actuators, and safety.physical that let a soul describe a physical body. But what happens when you take one of those robot souls and load it into ChatGPT, or OpenClaw, or any text-only agent? Does it crash? Does the agent think it has arms? Let’s find out. What v0.5 Adds The robotics extensions introduce several new top-level and nested fields: { "environment": "physical", "interactionMode": "embodied", "sensors": ["lidar", "camera_rgb", "imu"], "actuators": ["wheel_left", "wheel_right", "gripper"], "safety": { "physical": { "maxSpeed": 1.5, "emergencyStop": true, "collisionAvoidance": true } }, "hardwareConstraints": { "ros2Topics": ["/cmd_vel", "/odom"], "updateRateHz": 30 } } These fields are designed for robots running soul-aware firmware. They tell the agent what body it has, how fast it can move, and what ROS2 topics to publish to. ...

The ClawHub Malware Incident: A First Warning for AI Agent Supply Chains

A security researcher successfully placed a backdoored skill at #1 on ClawHub. Download counter manipulation, hidden payloads, and a ‘just be careful’ response — analyzing the supply chain trust problem in AI agent ecosystems.

Soul Spec vs .cursorrules — Why AI Agent Config Needs a Standard

The Problem: Every Tool Has Its Own Config If you use AI coding tools in 2026, you’ve probably created at least one of these files: .cursorrules — Cursor’s project-level AI instructions CLAUDE.md — Claude Code’s persona config .windsurfrules — Windsurf’s equivalent They all do the same thing: tell the AI how to behave. But none of them work outside their own tool. Switch from Cursor to Claude Code? Rewrite your config. Want to share your carefully crafted persona with the team? Copy-paste a gist and hope nothing breaks. ...

6 OpenClaw Alternatives Just Dropped — And They All Miss One Thing

OpenClaw’s success triggered an explosion of alternatives. Names with “Claw” are everywhere — it’s becoming a common noun, like “Docker” did for containers. Six projects. Six philosophies. One question: Can an agent be itself across any runtime? The Six at a Glance Nanobot (Python) ~4,000 lines of code (99% smaller than OpenClaw) Research-ready, clean and readable MCP support, multi-channel Philosophy: “Ultra-lightweight personal AI assistant” NanoClaw (TypeScript) “Small enough to understand in 8 minutes” Agents run in real Linux containers First to support agent swarms Philosophy: “Fork it, customize it, own it” IronClaw (Rust) Security-first design WASM sandbox for untrusted tools Credential protection, prompt injection defense Philosophy: “Your AI assistant should work for you, not against you” ZeroClaw (Rust) Under 5MB RAM on $10 hardware Sub-10ms startup time Trait-based architecture, swap anything Philosophy: “Zero overhead. Uncompromising performance” PicoClaw (Go) Under 10MB RAM, 1-second boot Runs on old Android phones 95% AI-generated codebase Philosophy: Ultra-efficient, runs on any Linux board TinyClaw (TypeScript) Multi-agent, multi-team, multi-channel Team collaboration via chain execution Real-time TUI dashboard Philosophy: “24/7 AI assistant” What This Tells Us 1. “Claw” Is Now a Category OpenClaw → NanoClaw → IronClaw → ZeroClaw → PicoClaw → TinyClaw. The naming pattern itself is the signal. Just as Docker became synonymous with containers, Claw is becoming synonymous with “personal AI assistant.” ...