Launch2026-02-22

We Verified Code from 4 AI Platforms. Average Score: 40/100

Four AI code generators. Four projects. One question: does the code actually do what it claims?

Platform	Score	Bugs	Critical
Replit	44/100	4	1
Bolt	42/100	6	2
Lovable	42/100	5	2
Claude	35/100	6	3

21 bugs total. 8 critical. None of the four passed a basic verification audit.

These aren't toy demos. Real apps, generated by the platforms people use every day. The code compiles. It runs. It just doesn't do what it says it does.

What went wrong

Every platform generated code with implicit claims — “this validates input,” “this handles auth,” “this sanitizes data.” Most of those claims were false.

Empty error handlers. Auth checks that don't check. Validation functions that validate nothing. The code looks right. It reads right. It ships to production. Then it breaks.

What Assay does

Assay extracts every implicit claim your code makes, then verifies each one against the actual implementation. Not “does it compile.” Not “does it pass lint.” Does it do what it says it does.

Try it now:

npx tryassay assess /path/to/project

It takes about 90 seconds. You get a score, a list of every claim that failed, and what to fix.

Why this matters

AI-generated code is already in production everywhere. The models are getting better at generating plausible code. They are not getting better at generating correct code.

Verification is not a training problem. It's an infrastructure problem. Layer 2 — external verification that sits below the model — is the fix.

Drop a repo link. I'll run it for free.

— Ty