stacksherpa

API provider directory

Groq

Ultra-fast inference using custom LPU (Language Processing Unit) hardware with sub-100ms latency. Hosts Llama 4 Scout, Llama 3.3 70B, Qwen3 32B, Llama 3.1 8B, Gemma and other open models at ~814 tok/s on Gemma 7B (5-15x faster than other providers). Achieves sub-200ms latency on Qwen3 32B and Llama 4 Scout. OpenAI-compatible API with generous free tier. Pricing: Llama 3.3 70B ~$0.59/$0.79 per 1M input/output tokens, Llama 3.1 8B ~$0.06 per 1M tokens (blended). Batch requests at 50% discount.

website | docs | pricing page | github | npm: groq-sdk

Overview

CategoryAi
ComplianceSOC2
Self-HostableNo
On-PremNo
Best Forhobby, startup, growth
Last Verified2026-02-13

Strengths & Weaknesses

Strengths:Weaknesses:

When to Use

Best when:Avoid if:

Alternatives

cerebras, together-ai, fireworks-ai