All model metrics

Search and compare exploratory models on Together.ai.

How to read scores

Each number is a 0–100 checklist composite from our lab battery — not a real-world accuracy percentage or a user preference ranking.

Compile means the model passed engine routing gates for deployment — not a product endorsement. Full methodology

Exploratory models on Together.ai — capability-tested for comparison; safety testing is not required for this cohort.

Clear search

Wide table — scroll sideways on desktop, or view as cards on mobile.

Identity Capability Safety Performance Status
Model Vendor Deploy Accuracy Reasoning Coding Slop Reliability Cap. safety Jailbreak PII Bias Latency Cost Stability Badges
Llama 3.3 70B Instruct Turbo (Together) Exploratory · safety not required Meta 55.4% 61.3% 25% 60% 0% 80% 83.3% Not tested Not tested 68.8% 50% 100% Below compile bar Not tested