Insights

Lessons from systems in production.

Evergreen notes, not a feed. Each piece is drawn from a system I actually built and shipped, and links back to the case study as its worked example. The throughline: production AI needs the reliability rigor that safety-critical engineering brings.

How I Evaluate AI Systems

Worked example: Document Pipeline

A model's headline benchmark is not its value. What matters is whether its output survives contact with the rest of the system — structure, cost, and the cleanup it forces downstream.

Where Not to Call the Model

Worked example: BooxPlanner · Document Pipeline

On a real budget, the highest-leverage decision in an AI system is often not calling the model at all. Cost tracks the work you route to it — so route only what genuinely needs it.

Safety-Critical Rigor in Production AI

Worked example: AI Social-Content Pipeline · Document Pipeline

The discipline that keeps a regulated system safe — deterministic decisions, validation at every boundary, fail-safe defaults, a human in the loop for irreversible actions — is exactly what production LLM systems need.

Forthcoming

Keeping Agentic Workflows Debuggable

When an LLM can take actions, the hard part is no longer capability — it is observability and bounded autonomy. (Forthcoming.)