Assay

The Proof

Benchmarks, audits, and verification results -- all independently reproducible.

New Case Study
Assay vs OpenClaw -- 5,000 files, 235 claims, 15.8% compliance
Full pipeline run against a real open-source AI assistant. 43 failures, 15 partial gaps, 1 critical security issue.

100% pass @ k=5 on HumanEval (164/164)

164 coding tasks. Four verification methods compared across k=1, k=3, and k=5 iterations.

Methodk=1k=3k=5
Baseline86.6%----
Self-Refine87.2%87.2%87.8%
LLM-as-Judge98.2%99.4%97.2%
Assay98.8%100%100%

LLM-as-judge drops to 97.2% at k=5 -- false positive accumulation. Assay converges monotonically to 100%.

300 real software engineering tasks

SWE-bench Lite: real GitHub issues from production repositories.

18.3%
Baseline k=1
25%
LUCID k=1
+36.6%
30.3%
LUCID best
+65.6%

Won 7 of 10 head-to-head tasks

Real-world coding challenges scored by expert judges on correctness, security, and edge-case handling.

21.6/30
Baseline
27.2/30
Forward Assay
7 of 10 tasks won

Market Context

$7.4B
AI code generation market
27% CAGR
42% → 33%
Developer trust in AI code (YoY)
Aug 2026
EU AI Act enforcement begins

Recent comparables:

Code Metal$1.25B
CodeRabbit$550M
Snyk$7.4B

Formal verification of AI code, AI code quality gates, and developer security -- all adjacent to Assay's verification substrate.