Models Agents Evals VisualizeIndustry

AgMoDB by @mistakeknot

Model picks

Current defaults by use case.

Product

Production assistants and internal tools.

Default

Claude Sonnet 4.6 (Non-reasoning, High Effort)

Anthropic

AgMoBench 85.8$6.00/M49 tok/s

Reliable product default.

Value

GPT-5.4 mini (xhigh)

OpenAI

AgMoBench 53.4$1.69/M167 tok/s

Lower-cost product lane.

Ceiling

GPT-5.5 (xhigh)

OpenAI

AgMoBench 64.8$11.25/M60 tok/s

Higher ceiling, higher spend.

Browse all models Compare picks

Human frontier

1Anthropic: Claude Opus 4.7AnthropicHuman Frontier 95.5$10.00/M—2Claude Opus 4.6 (Non-reasoning, High Effort)AnthropicHuman Frontier 95.2$10.00/M48 tok/s 3GLM-5.1 (Reasoning)Z AIHuman Frontier 93.6$2.15/M71 tok/s 4Claude Sonnet 4.6 (Non-reasoning, High Effort)AnthropicHuman Frontier 93.4$6.00/M49 tok/s 5Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)AnthropicHuman Frontier 92.2$20.00/M65 tok/s 6Claude Opus 4.5 (Non-reasoning)AnthropicHuman Frontier 92.2$10.00/M53 tok/s

Worth discovering

Kimi K2.6

Kimi

Strong frontier/value ratio.

Cheap reasoning

DeepSeek V4 Flash (Reasoning, Max Effort)

DeepSeek

Aggressive reasoning price/performance.

Fast batch work

Gemini 3.1 Flash-Lite

Google

Fast, cheap high-throughput lane.

Qwen3.6 35B A3B (Reasoning)

Alibaba

Open-ish frontier compression.