The 4 Rules AI Coding Agents Must Follow (And Why Business AI Doesn't)
Andrej Karpathy — the engineer who built Tesla Autopilot and helped found OpenAI — recently put words to something every AI builder knows but few can articulate: AI coding agents fail in specific, predictable ways.
He observed that LLMs make wrong assumptions on your behalf. They don't manage confusion. They don't seek clarification. They don't push back. Instead, they overcomplicate code, bloat abstractions, and silently change things they shouldn't.
The community responded by codifying his observations into four principles — now a GitHub repo with 141,000 stars and 14,500 forks. These principles aren't just for code. They're the exact same discipline I apply to AI business deployment.
Here are the four rules — and what each one means when you apply it to the AI systems businesses run on.
Think Before Coding
What it means for code: Don't assume. State your assumptions explicitly. If uncertain, ask. If multiple interpretations exist, present them — don't pick silently. If a simpler approach exists, push back.
What it means for business AI: Before I deploy any module, I map the entire workflow — every system, every handoff, every exception. I don't assume the CRM is structured the way the sales team describes it. I don't assume the booking system has the fields the front desk says it does. I verify. I surface what's confusing. I present what needs to change.
Simplicity First
What it means for code: Minimum code that solves the problem. No features beyond what was asked. No abstractions for single-use. If 200 lines could be 50, rewrite it.
What it means for business AI: I don't deploy a "full AI transformation." I deploy Module 1. One function. One department. One measurable outcome. SMS appointment reminders before phone agents. Phone agents before email handling. Email before CRM integration. Each module proves itself before the next begins.
Surgical Changes
What it means for code: Touch only what you must. Don't "improve" adjacent code. Don't refactor things that aren't broken. Match existing style, even if you'd do it differently. Every changed line must trace directly to the request.
What it means for business AI: My AI reads your CRM. It drafts email responses. It never sends them. It never modifies records it wasn't asked to modify. It never "improves" a workflow you didn't authorize. The audit trail shows exactly what was touched and why. Every action traces to a human decision.
Goal-Driven Execution
What it means for code: Define success criteria before coding. "Add validation" becomes "Write tests for invalid inputs, then make them pass." Loop until verified.
What it means for business AI: I don't measure engagement. I don't measure message counts. I measure time saved per FTE, error rate reduction, compliance adherence, and audit pass rate. Module 1 goes live with defined metrics. It runs until the numbers prove themselves — or they don't. Only then does Module 2 start.
Why This Matters for Business
Here's what nobody tells you: the discipline Karpathy formalized for coding agents is the same discipline that determines whether your business AI ships or fails.
The companies that succeed with AI follow these four rules — whether they know it or not. They think before deploying. They start simple. They change only what must change. They measure outcomes, not activity.
The companies that fail with AI violate them. Every time. They assume. They overcomplicate. They let AI touch systems it shouldn't. They never define what success looks like.
This is why my deployments follow the same four rules, adapted for business operations instead of codebases. The model is replaceable. The discipline isn't.
The repo is open-source. The principles are free. The toolkit is public. What I bring is the adaptation layer — how these rules apply when the "codebase" is a hospital's patient intake workflow or a lodge's booking system or a law firm's client communication pipeline.
That's the difference between an AI deployment that works on Day 300 and one that's collecting errors by Month 3. It's not the model. It's the rules you build by.