How often does the same prompt produce different code?

On a competitive-programming benchmark, the same prompt produced zero matching outputs 75.76% of the time (Ouyang et al., ACM TOSEM 2024). Sampling a model 1,000 times at temperature 0 still produced 80 distinct outputs (Thinking Machines, 2025).

Will better prompting fix convention drift?

No — the variance isn't lexical. Prompts differing only in trivial formatting produced swings of up to 76 accuracy points (Sclar et al., ICLR 2024). What closes it is persistent context: a tiered, curated core that loads your conventions every session.

Convention Drift & the Borrowed Architecture

Why does AI generate different code each time I run the same prompt? Because LLM inference is nondeterministic at the systems level, and the agent re-infers your conventions every session instead of loading them. The pattern has a name: convention drift — the agent borrows a convention from its training data instead of yours; same prompt, different session, a different convention each time.

The research most people haven’t seen

Researchers ran the same coding prompt repeatedly and measured functionally-identical output. On a competitive-programming benchmark, the same prompt produced zero matching outputs 75.76% of the time (Ouyang et al., ACM TOSEM 2024, arXiv:2308.02828).

And it’s not the temperature setting. Sampling a model 1,000 times at temperature 0 produced 80 distinct outputs (Thinking Machines, 2025). Nondeterminism is a property of batched GPU inference and floating-point non-associativity — not a dial you forgot to turn.

Why “prompt it better” doesn’t close it

Because the variance isn’t lexical. Prompts differing only in trivial formatting — separators, capitalization — produced swings of up to 76 accuracy points on the same task (Sclar et al., ICLR 2024, arXiv:2310.11324). The surface form, not the meaning, moves the output. You can’t word your way out of a different architecture per run.

Convention drift is one of three forms

The Borrowed Architecture has three forms: convention drift (above), assumption propagation (a reasonable-in-isolation assumption that passes tests and fails in prod against an older decision), and gate erosion (your review threshold migrating down without a decision). All three are decisions in your code that you never made. The diagnosis behind all of them is the Fluency Trap.

What closes convention drift

Persistent context — a tiered, curated core that loads your conventions every session, so the agent stops re-inferring them. Not a 200-line dump (that’s a monolith); a curated Tier-1 core you build up from tested rules. (See: context tiering.)

See how persistent context is built: curiochat.ai/software-engineer

Why AI Generates Different Code Every Run: Convention Drift and the Borrowed Architecture

The research most people haven’t seen

Why “prompt it better” doesn’t close it

Convention drift is one of three forms

What closes convention drift

Related