Fireworks AI

Fast inference platform with broad model selection. Hosts GLM-4.7, Qwen3 (8B/30B), Kimi K2.5, and many open models. FireFunction models for reliable tool/function calling. Compound AI system support. Cached input tokens at 50% off, batch at 50% off, no premium for fine-tuned model inference. OpenAI-compatible API. Pricing: Qwen3 8B ~$0.20/1M, Qwen3 30B ~$0.26/1M, GLM-4.7 ~$0.60/$2.20 per 1M tokens.

website | docs | pricing page | github | npm: fireworks-js

Overview

Category	Ai Image
Compliance	SOC2, GDPR
Self-Hostable	No
On-Prem	No
Best For	startup, growth, enterprise
Last Verified	2026-02-12

Strengths & Weaknesses

Strengths:

performance
cost
dx

Weaknesses:

Smaller brand recognition than major cloud providers
Dependent on open-source model improvements

When to Use

Best when:

Need reliable function/tool calling with open models
Building agent-based systems
Cost-sensitive production workloads
Need fine-tuned models at no inference premium

Avoid if:

Need proprietary frontier models
Require extensive enterprise support

Alternatives

together-ai, groq, replicate