The 4 Rules AI Coding Agents Must Follow (And Why Business AI Doesn't)

Andrej Karpathy — the engineer who built Tesla Autopilot and helped found OpenAI — recently put words to something every AI builder knows but few can articulate: AI coding agents fail in specific, predictable ways.

He observed that LLMs make wrong assumptions on your behalf. They don't manage confusion. They don't seek clarification. They don't push back. Instead, they overcomplicate code, bloat abstractions, and silently change things they shouldn't.

The community responded by codifying his observations into four principles — now a GitHub repo with 141,000 stars and 14,500 forks. These principles aren't just for code. They're the exact same discipline I apply to AI business deployment.

Here are the four rules — and what each one means when you apply it to the AI systems businesses run on.

Principle 1

Think Before Coding

What it means for code: Don't assume. State your assumptions explicitly. If uncertain, ask. If multiple interpretations exist, present them — don't pick silently. If a simpler approach exists, push back.

What it means for business AI: Before I deploy any module, I map the entire workflow — every system, every handoff, every exception. I don't assume the CRM is structured the way the sales team describes it. I don't assume the booking system has the fields the front desk says it does. I verify. I surface what's confusing. I present what needs to change.

✗ Business failure: "We deployed AI to handle appointment booking. It scheduled patients into slots that didn't exist." — The AI assumed. Nobody verified.

✓ My approach: Paid audit first. Map the workflow. Surface the gaps. Only then build.

Principle 2

Simplicity First

What it means for code: Minimum code that solves the problem. No features beyond what was asked. No abstractions for single-use. If 200 lines could be 50, rewrite it.

What it means for business AI: I don't deploy a "full AI transformation." I deploy Module 1. One function. One department. One measurable outcome. SMS appointment reminders before phone agents. Phone agents before email handling. Email before CRM integration. Each module proves itself before the next begins.

✗ Business failure: "We bought an all-in-one AI platform. After six months, only the chatbot was working — and it was answering questions wrong." — Features nobody asked for, complexity that added nothing.

✓ My approach: "What's the ONE business function you want automated first? That's Module 1. Nothing else."

Principle 3

Surgical Changes

What it means for code: Touch only what you must. Don't "improve" adjacent code. Don't refactor things that aren't broken. Match existing style, even if you'd do it differently. Every changed line must trace directly to the request.

What it means for business AI: My AI reads your CRM. It drafts email responses. It never sends them. It never modifies records it wasn't asked to modify. It never "improves" a workflow you didn't authorize. The audit trail shows exactly what was touched and why. Every action traces to a human decision.

✗ Business failure: "The AI agent was supposed to answer support emails. It changed the billing status on three customer accounts." — Side-effect edits. Surgical failure.

✓ My approach: HITL gates at every write action. AI reads. AI drafts. Human approves. Audit trail logs.

Principle 4

Goal-Driven Execution

What it means for code: Define success criteria before coding. "Add validation" becomes "Write tests for invalid inputs, then make them pass." Loop until verified.

What it means for business AI: I don't measure engagement. I don't measure message counts. I measure time saved per FTE, error rate reduction, compliance adherence, and audit pass rate. Module 1 goes live with defined metrics. It runs until the numbers prove themselves — or they don't. Only then does Module 2 start.

✗ Business failure: "The AI handled 10,000 conversations last month. We have no idea if any of them were correct." — Measured activity. Ignored outcome.

✓ My approach: "SMS reminders reduced no-shows from 18% to 4%. Module 1 verified. Now we talk about Module 2."

Why This Matters for Business

Here's what nobody tells you: the discipline Karpathy formalized for coding agents is the same discipline that determines whether your business AI ships or fails.

The companies that succeed with AI follow these four rules — whether they know it or not. They think before deploying. They start simple. They change only what must change. They measure outcomes, not activity.

The companies that fail with AI violate them. Every time. They assume. They overcomplicate. They let AI touch systems it shouldn't. They never define what success looks like.

This is why my deployments follow the same four rules, adapted for business operations instead of codebases. The model is replaceable. The discipline isn't.

The repo is open-source. The principles are free. The toolkit is public. What I bring is the adaptation layer — how these rules apply when the "codebase" is a hospital's patient intake workflow or a lodge's booking system or a law firm's client communication pipeline.

That's the difference between an AI deployment that works on Day 300 and one that's collecting errors by Month 3. It's not the model. It's the rules you build by.

John Bianchina builds AI implementation systems for hospitality, healthcare, and professional services. His current stack includes Hermes (concierge orchestration), Paperclip (multi-agent management), and Agent Zero (autonomous research). He operates from South Africa and serves clients internationally. More about his work →

← Back to all articles