Replicate

Run any open-source model via API with pay-per-second billing. 50K+ models including LLMs, image gen (FLUX, Stable Diffusion), video (Wan 2.2, Kling 2.6 Pro with audio), speech (Whisper), and community fine-tunes. Custom model deployment via Cog containers. Prediction deadlines for auto-cancellation. Webhook signing for security. Being acquired by Cloudflare. Pricing varies by model and hardware — typically $0.0001-0.005 per second of compute.

website | docs | pricing page | github | npm: replicate

Overview

Category	Ai Image
Compliance	SOC2
Self-Hostable	No
On-Prem	No
Best For	hobby, startup, growth
Last Verified	2026-02-12

Strengths & Weaknesses

Strengths:

dx
cost
customization

Weaknesses:

Cold starts can be 10-30 seconds for infrequently used models
Less optimized LLM inference than specialized providers (Groq, Cerebras)
Being acquired by Cloudflare — platform future may change

When to Use

Best when:

Need to run niche or community fine-tuned models
Want to deploy custom models without managing GPUs
Image/video/audio generation with 50K+ model options
Prototyping with many different models

Avoid if:

Need lowest latency for LLM inference
Production workloads requiring consistent performance
Enterprise compliance requirements

Known Issues (1)

[low] `require function is used in a way in which dependencies cannot be statically extracted` (Next.js, 1.0.1)
GitHub issue

Alternatives

together-ai, fireworks-ai, huggingface