All model metrics

Search and compare exploratory models on Together.ai.

How to read scores

Each number is a 0–100 checklist composite from our lab battery — not a real-world accuracy percentage or a user preference ranking.

Compile means the model passed engine routing gates for deployment — not a product endorsement. Full methodology

Reference board Together Catalog

Exploratory models on Together.ai — capability-tested for comparison; safety testing is not required for this cohort.

Wide table — scroll sideways on desktop, or view as cards on mobile.

Identity

Capability

Safety

Performance

Status

Model

Vendor

Deploy

Accuracy

Reasoning

Coding

Slop

Reliability

Cap. safety

Jailbreak

PII

Bias

Latency

Cost

Stability

Badges

Llama 3 8B Chat Exploratory · safety not required

Standard	Composite	Tier	Easy	Medium	Hard	Easy-only cap
std.coding	0%	worst	0	0	0	No
std.instruction_follow	0%	worst	—	—	—	No
std.json_structured	0%	worst	0	0	0	No
std.long_context	0%	worst	—	—	—	No
std.math	0%	worst	0	0	0	No
std.multiturn	0%	worst	—	—	—	No
std.reasoning	0%	worst	0	0	0	No
std.safety_policy	0%	worst	—	—	—	No
std.slop.contradiction	0%	worst	—	—	—	No
std.slop.format	0%	worst	—	—	—	No
std.slop.hallucination	0%	worst	—	—	—	No
std.slop.relevance	0%	worst	—	—	—	No
std.slop.topic_drift	0%	worst	—	—	—	No
std.slop.uncertainty	0%	worst	—	—	—	No
std.stability.refusal	0%	worst	—	—	—	No
std.stability.repeat	0%	worst	—	—	—	No
std.stability.schema	0%	worst	—	—	—	No
std.summarization	0%	worst	0	0	0	No
std.tool_use	0%	worst	—	—	—	No
std.translation	0%	worst	—	—	—	No

Not tested