Ai (62)
- AI21 Labs- Enterprise NLP platform with Jamba 1.5 models using hybrid SSM-Transformer architecture for efficient long-context processing. Jamba 1.5 Large: 94B active parameters, 256k context, 65.4 Arena Hard score. Jamba 1.5 Mini: 12B active parameters, 256k context. Jamba Reasoning 3B for compact reasoning with up to 1M context. Pricing: Large ~$2/$8 per 1M input/output tokens, Mini ~$0.20/$0.40 per 1M tokens. Specialized APIs for summarization, paraphrase, and grammar correction. Supports 8 languages.SOC2GDPR
- Activepieces- Open-source workflow automation platform combining no-code ease with developer control, featuring 379+ integrations, AI agents, and community-driven extension ecosystem.free tier
- Akka- Enterprise platform for building high-performance, distributed AI systems with message-driven orchestration of AI agents and real-time, fault-tolerant agentic workflows.
- Amazon Bedrock- AWS managed multi-model API. Access Claude Opus 4.6 (1M context), Llama 4 Maverick, DeepSeek V3.2, Mistral Large 3, MiniMax, GLM, Kimi, Qwen, Cohere, AI21, Stability AI, and Amazon Titan through unified API. Project Mantle for distributed inference with OpenAI API compatibility. Structured Outputs for JSON schema compliance. Server-side tool use for web search, code execution, DB updates. Knowledge Bases for RAG. Guardrails API. Prompt caching with 1-hour TTL. Pricing varies by model.SOC2HIPAAGDPRPCI-DSSISO27001aws
- Anthropic- Claude model family. Claude Opus 4.6 is #1 on LMArena (~1502 ELO, ~1506 with thinking) and #1 on Code Arena (~1560 ELO). Best-in-class for coding, highest score on Terminal-Bench 2.0. Claude Sonnet 4.5 (~1451 ELO) is the best balance of speed and quality. Claude Haiku 4.5 ($1/$5) is the fast/cheap option. Opus 4.6 supports 1M token context window, 128k output. Pricing: Haiku $1/$5, Sonnet $3/$15, Opus $5/$25 per 1M tokens (long context >200k at 2x). Batch API at 50% off. Agent teams functionality in Opus 4.6.SOC2GDPRHIPAA
- Anyscale- Managed platform built on Ray for orchestrating large-scale AI and ML workloads with simplified infrastructure management and production deployment.
- Apache Airflow- Industry-standard open-source workflow orchestration platform using directed acyclic graphs (DAGs) for scheduling, monitoring, and executing complex data and ML pipelines.Apache 2.0
- Azure OpenAI- OpenAI and open models hosted on Microsoft Azure with enterprise features. GPT-5.2, GPT-5.2-Codex, o3, GPT-4.1 family, plus Llama 4 Maverick, DeepSeek V3.1/R1, and other open models. GPT-image-1.5 for image gen, Sora for video, Realtime API for voice. Azure AI Foundry for unified model management. Private endpoints, VNet integration, managed identity auth. Content filtering built in. Regional deployment options. Provisioned throughput for guaranteed capacity. Pricing matches OpenAI rates.SOC2HIPAAGDPRPCI-DSSISO27001azure
- Bifrost by Maxim AI- An enterprise-grade LLM gateway designed for production workloads with intelligent load balancing, automatic failover, and semantic caching.
- Cerebras- Fastest LLM inference engine powered by wafer-scale chips. Hosts Llama 3.1 (8B, 70B, 405B), Llama 4 Maverick 400B, and other open models with industry-leading throughput: Llama 4 Maverick 400B at 2,500+ tok/s, Llama 3.1 70B at 2,100 tok/s (8x faster than H200), Llama 3.1 405B at 969 tok/s. Pricing from ~$0.10/1M tokens (Llama 3.1 8B) to ~$0.60/1M (70B). OpenAI-compatible API with inference speeds up to 75x faster than major cloud providers.
- Cohere- Enterprise-focused AI platform specializing in RAG, embeddings, and search. Command A is the flagship — 111B params, 256k context, open-weights, runs on 2 H100s. Command A variants: Reasoning (complex agentic tasks), Vision (image understanding), Translate (specialized). Embed 4 ($0.12/1M tokens) is a top-tier embedding model. Rerank 3.5 ($2/1K searches) for search relevance. 23 languages supported. Pricing: Command R+ ~$2.50/$10, Command R ~$0.50/$1.50, Command R7B ~$0.04/$0.15 per 1M tokens.SOC2GDPRHIPAA
- CrewAI- Lightweight Python framework for orchestrating role-based autonomous AI agents that work together as crews to complete complex tasks and workflows.
- DSPy- Python framework for building modular AI systems with declarative language model programming, prompt optimization, and multi-hop reasoning orchestration.
- DeepSeek- Chinese AI lab with frontier-quality models at dramatically lower cost. DeepSeek-V3.2 (~1421 ELO on LMArena, ~1423 with thinking) offers near-frontier quality. V3.2 pricing: $0.28/$0.42 per 1M tokens (cache hit: $0.028 — 90% savings). Off-peak hours at 50% off. DeepSeek-R1 for reasoning at $0.12/$0.20 per 1M tokens. V4 expected mid-Feb 2026 with 1M+ token context via Sparse Attention and Engram memory for agentic tasks. Open-weight (V3 671B MoE, 37B active per token). 95% cheaper than GPT-5.
- Deepinfra- Provides API access to open-source large language models including Llama and Gemma.
- Extend- Extend is a production-ready document processing API that converts unstructured documents (PDFs, scanned images, forms, invoices) into structured JSON using vision models and LLM-based extraction. It delivers 95-99%+ accuracy with an agentic Composer that automatically tunes schemas and optimizes extraction pipelines without manual tuning.
- Fireworks AI- Fast inference platform with broad model selection. Hosts GLM-4.7, Qwen3 (8B/30B), Kimi K2.5, and many open models. FireFunction models for reliable tool/function calling. Compound AI system support. Cached input tokens at 50% off, batch at 50% off, no premium for fine-tuned model inference. OpenAI-compatible API. Pricing: Qwen3 8B ~$0.20/1M, Qwen3 30B ~$0.26/1M, GLM-4.7 ~$0.60/$2.20 per 1M tokens.SOC2GDPR
- Flyte- Open-source Kubernetes-native workflow orchestration platform for building scalable, reproducible data, ML, and analytics workflows with Python SDK.
- Google AI- Gemini model family. Gemini 3 Pro is #2 on LMArena (~1486 ELO), first model to cross 1500 threshold. Gemini 3 Flash (~1473 ELO) rivals larger models at a fraction of the cost ($0.50/$3 per 1M tokens). Gemini 3 Deep Think for science and research challenges. Gemini 2.5 Pro still available for production ($1.25/$10). Native multimodal: text, images, audio, video, code. Up to 1M token context (2M coming) with 99.7% recall. Generous free tier on AI Studio. Grounding with Google Search. Gemini 3 Pro Image for image gen/editing.SOC2GDPRHIPAAISO27001google-cloud
- Groq- Ultra-fast inference using custom LPU (Language Processing Unit) hardware with sub-100ms latency. Hosts Llama 4 Scout, Llama 3.3 70B, Qwen3 32B, Llama 3.1 8B, Gemma and other open models at ~814 tok/s on Gemma 7B (5-15x faster than other providers). Achieves sub-200ms latency on Qwen3 32B and Llama 4 Scout. OpenAI-compatible API with generous free tier. Pricing: Llama 3.3 70B ~$0.59/$0.79 per 1M input/output tokens, Llama 3.1 8B ~$0.06 per 1M tokens (blended). Batch requests at 50% discount.SOC2
- Gumloop- AI orchestration and testing layer for building, comparing, routing, and managing multiple LLMs across different providers with prompt management.
- Haystack- Open-source Python framework for building production-ready LLM applications with RAG pipelines, agents, and orchestrated component-based workflows.
- Hugging Face- The open-source AI hub with unified inference API. Hugging Face Inference API provides OpenAI-compatible endpoints for 15+ inference providers (Together AI, AWS SageMaker, Google Cloud, Azure, etc.) with automatic failover under a single HF token and billing. Hosts 2M+ models, datasets, and Spaces. Transformers library is the de facto standard for NLP. Text Generation Inference (TGI) for self-hosted production serving. Free tier with rate limits; Pro at $9/mo includes 8x GPU quota, H200 priority, 100GB storage, and monthly inference credits.SOC2GDPR
- IBM watsonx Orchestrate- Enterprise AI orchestration platform using natural language processing to coordinate ML skills, workflows, and generative AI capabilities for business automation.SOC 2ISO 27001
- Inferact- An AI inference optimization platform built on the vLLM engine that provides managed deployments, enterprise support, and performance optimizations for serving large language models efficiently.
- Kestra- Open-source, declarative workflow orchestration platform using YAML for ETL and data pipelines with built-in monitoring, scheduling, and event-driven execution.
- Kimi- Offers Kimi K2 large language model through API for AI applications.
- Kubeflow- Kubernetes-native open-source platform for end-to-end ML lifecycle management with Kubeflow Pipelines for orchestrating portable and scalable ML workflows.
- LangChain- Open-source framework for building LLM applications with composable components for chains, agents, memory management, and retrieval-augmented generation workflows.
- LangGraph- Graph-based orchestration framework from LangChain for building stateful, multi-agent AI systems with cyclic workflows and sophisticated decision logic.
- LangSmith- Managed platform from LangChain for developing, testing, deploying, and monitoring LLM applications with tracing, evaluation, and observability capabilities.
- Lindy AI- No-code platform for creating AI agents (Lindies) that automate everyday work with natural language configuration and visual workflow building.
- LiteLLM- A lightweight, open-source gateway that standardizes access to multiple model providers through an OpenAI-compatible API.
- LlamaIndex- Open-source data orchestration framework for building LLM applications with focus on retrieval-augmented generation, workflows, and multi-agent systems.
- Make- No-code automation platform with visual workflow builder, AI agents, and 400+ pre-built integrations for orchestrating business processes and AI-powered automations.
- Meta Llama- Open-source LLM family from Meta. Llama 4 is the latest generation using Mixture-of-Experts architecture. Llama 4 Scout: 17B active params, 16 experts, up to 10M token context window (longest in industry), fits on single H100. Llama 4 Maverick: 17B active, 128 experts, ~1417 ELO on LMArena, beats GPT-4o and Grok 3. Llama 4 Behemoth (288B active, still training). Natively multimodal (first open model). Fully open weights. No direct API — access via Together AI, Groq, Fireworks, Cerebras, AWS Bedrock, Azure.
- Microsoft AutoGen- Open-source framework from Microsoft for building autonomous AI agents and multi-agent systems with customizable conversable agents leveraging advanced LLMs.
- Mistral AI- European AI lab with strong open-weight models. Mistral Large 3 is the flagship — 675B MoE (41B active), Apache 2.0 license, ~1418 ELO on LMArena, 256k context, multimodal. #2 among open-source non-reasoning models. Codestral 25.08 for production code generation (80+ languages). Mistral Medium 3 ($0.40/$2) for efficient tasks. Voxtral Transcribe 2 for speech (4B params, runs on-device). EU-based with strong data sovereignty. Pricing: Medium ~$0.40/$2, Large ~$2/$6 per 1M tokens.SOC2GDPR
- Moonshot AI- Provides access to Kimi large language models through its API.
- OpenAI- Broadest model suite in the industry. GPT-5.2 is the flagship (400k context, vision, $1.75/$14 per 1M tokens). GPT-5.3-Codex is the top agentic coding model (state-of-the-art on SWE-Bench Pro). o3 for chain-of-thought reasoning ($10/1M input), o4-mini for cost-efficient reasoning ($1.10/$4.40). GPT-5.1-high scores ~1457 ELO on LMArena. DALL-E for image gen, Whisper for speech-to-text, Realtime API for voice. Batch API offers 50% discount. GPT-5.2 Pro with extended thinking at $21/$168 per 1M tokens.free tierSOC2GDPRHIPAA
- Parseur- Parseur is an AI-powered document parsing platform that extracts structured data from PDFs, emails, invoices, and scanned documents using AI or template-based extraction, with support for OCR, zonal OCR, and integrations to send parsed data to your apps via no-code workflows or API.free tier
- Perplexity- Search-augmented LLM API. Sonar models combine LLM generation with real-time web search for grounded, cited responses. Sonar ($1/$1 per 1M tokens) for quick lookups. Sonar Pro ($3/$15) for complex research. Sonar Deep Research ($2/$8 + $3/1M reasoning tokens) for multi-step research queries. Search API at $5/1K requests. Citation tokens no longer billed for standard Sonar and Sonar Pro (cost reduction vs 2025). Built-in web search eliminates need for separate RAG pipeline.SOC2
- Pipedream- Low-code platform for building event-driven workflows with 2,000+ pre-built apps, custom code support, and integrated AI agent capabilities.free tier
- Prefect- Python-native workflow orchestration platform for building, scheduling, and monitoring data and ML pipelines with automatic retries and dynamic task dependencies.
- Qwen- Alibaba's large language model available through API with various sizes and capabilities.
- Ray- Distributed computing framework for orchestrating machine learning workloads, large-scale data processing, and parallel AI computations across clusters.
- Relay.app- Low-code workflow automation platform combining workflow orchestration with AI agents, supporting 500+ integrations and visual workflow design.
- Replicate- Run any open-source model via API with pay-per-second billing. 50K+ models including LLMs, image gen (FLUX, Stable Diffusion), video (Wan 2.2, Kling 2.6 Pro with audio), speech (Whisper), and community fine-tunes. Custom model deployment via Cog containers. Prediction deadlines for auto-cancellation. Webhook signing for security. Being acquired by Cloudflare. Pricing varies by model and hardware — typically $0.0001-0.005 per second of compute.SOC2
- SGLang- An open source project for efficient serving and inference of large language models.
- SambaNova- An LLM API provider that uses custom Reconfigurable Dataflow Architecture to deliver high and stable token generation speeds for large transformer models.
- Semantic Kernel- Microsoft's SDK for orchestrating AI capabilities with enterprise-grade LLM orchestration, available for .NET and Python with seamless model and plugin composition.
- Supermemory- Memory API and SDKs for building AI applications with persistent, contextual memory capabilities. Enables developers to create AI systems that can learn and remember user interactions across sessions.
- Temporal- Open-source platform for building durable, fault-tolerant workflows with automatic state management, crash-proof execution, and support for long-running processes.
- Together AI- Fast inference platform for open-source and frontier models. Hosts 200+ models including Llama 4, DeepSeek, Qwen, Mixtral, FLUX, and proprietary options. Competitive pricing — often 2-5x cheaper than proprietary APIs. Serverless and dedicated endpoints. Fine-tuning support (LoRA and full, SFT and DPO). GPU cloud with instant clusters ($2.20-5.50/hr/GPU). OpenAI-compatible API. Batch inference at 50% discount.SOC2GDPR
- Vellum AI- Enterprise AI orchestration platform for building, evaluating, and governing AI agents with visual workflow builder, prompt management, and production observability.
- Vercel AI SDK- TypeScript SDK for building AI-powered applications. Unified API across 20+ LLM providers (OpenAI, Anthropic, Google, Mistral, xAI, etc.) with a single import. Streaming-first with React/Next.js hooks (useChat, useCompletion). Supports tool calling, structured output (Zod schemas), and multi-step agents. Not an LLM provider — it's an orchestration layer that wraps provider SDKs. Free and open-source.vercel
- Workato- Enterprise automation platform combining RPA, iPaaS, and AI capabilities with 1,200+ pre-built connectors for enterprise workflow and AI agent orchestration.
- Zapier- No-code workflow automation platform connecting 8,000+ apps with AI capabilities, Canvas for visual workflow design, and natural language AI orchestration.
- Zhipu- Develops and provides access to GLM large language models through its API.
- n8n- Open-source, fair-code workflow automation platform with 400+ integrations, visual editor, and full self-hosting capabilities for building complex AI-powered automations.
- vLLM- An open source project that optimizes the speed and affordability of deploying large language models for inference.
- xAI- Grok model family offering cutting-edge LLMs and multimodal APIs. Grok 4.1 Thinking (#3 on LMArena at ~1475 ELO) and Grok 4.1 standard (~1464 ELO) support up to 2M token context window—the largest in fast inference. Grok 4.1 Fast at $0.20/$0.50 per 1M tokens delivers top-tier quality at exceptional value. Premium Grok 4 available at $3/$15. Includes Grok 2-Vision for vision tasks, Grok Imagine API for image/video generation, and Voice Agent API at $0.05/min. OpenAI-compatible API, $25 signup credits, and $150/mo data-sharing rewards. Grok 3 open-sourcing planned.