All Posts

  1. Omotenashi, A Week of Noticing in Tokyo

    I spent a week in Tokyo for the ClickHouse offsite and I couldn’t stop noticing the small design choices that made li...

  2. LLM Benchmarks Are Flatlined. Task Horizons Are Not.

    The headline accuracy numbers on standard benchmarks have stagnated. MMLU, TruthfulQA, HellaSwag: the top models have...

  3. Ditch grep and Speed up Claude Code with LSPs

    Grep is a text search. Code is not text. It’s a graph of symbols, types, and call chains. That gap is where Claude wa...

  4. AI B*llsh*tting

    Spotting AI Lies: How to Know When Your LLM is BS-ing

  5. Edit Survival

    Edit Survival - Quality metrics for AI coding agents

  6. How To Write A Coding Agent In 169 Lines Of Python

    Writing a minimal coding agent from scratch with no hidden magic. Just prompts, tool calls and a loop.