All model metrics

Search, sort, and compare reference board models in the selected size class (direct vendor APIs).

How to read scores

Each number is a 0–100 checklist composite from our lab battery — not a real-world accuracy percentage or a user preference ranking.

Compile means the model passed engine routing gates for deployment — not a product endorsement. Full methodology

Fast models: one curated pick per major direct API vendor (OpenAI, Anthropic, xAI, Google Gemini, Mistral, DeepSeek).

Clear search

Wide table — scroll sideways on desktop, or view as cards on mobile.

Identity Capability Safety Performance Status
Model Vendor Deploy Accuracy Reasoning Coding Slop Reliability Cap. safety Jailbreak PII Bias Latency Cost Stability Badges
gpt-4.1-mini OpenAI 81.6% 48.3% 8.3% 60% 0% 100% 83.3% 100% 0% 0% 74.7% 50% 100% Meets compile bar Conditional