We Verified Code from 4 AI Platforms. Average Score: 40/100
Four AI code generators. Four projects. One question: does the code actually do what it claims?
| Platform | Score | Bugs | Critical |
|---|---|---|---|
| Replit | 44/100 | 4 | 1 |
| Bolt | 42/100 | 6 | 2 |
| Lovable | 42/100 | 5 | 2 |
| Claude | 35/100 | 6 | 3 |
21 bugs total. 8 critical. None of the four passed a basic verification audit.
These aren't toy demos. Real apps, generated by the platforms people use every day. The code compiles. It runs. It just doesn't do what it says it does.
What went wrong
Every platform generated code with implicit claims — “this validates input,” “this handles auth,” “this sanitizes data.” Most of those claims were false.
Empty error handlers. Auth checks that don't check. Validation functions that validate nothing. The code looks right. It reads right. It ships to production. Then it breaks.
What Assay does
Assay extracts every implicit claim your code makes, then verifies each one against the actual implementation. Not “does it compile.” Not “does it pass lint.” Does it do what it says it does.
Try it now:
npx tryassay assess /path/to/projectIt takes about 90 seconds. You get a score, a list of every claim that failed, and what to fix.
Why this matters
AI-generated code is already in production everywhere. The models are getting better at generating plausible code. They are not getting better at generating correct code.
Verification is not a training problem. It's an infrastructure problem. Layer 2 — external verification that sits below the model — is the fix.
Drop a repo link. I'll run it for free.
— Ty