Deploy AI Models Instantly on Hugging Face

Run 15,000+ Models Instantly!
Hugging Face’s Inference Providers give developers access to hundreds of machine learning models, powered by world-class inference providers. They are also integrated into our client SDKs (for JS and Python), making it easy to explore serverless inference of models on your favorite providers.

Partners
Our platform integrates with leading AI infrastructure providers, giving you access to their specialized capabilities through a single, consistent API. Here’s what each partner supports:

Why Choose Inference Providers?
When you build AI applications, it’s tough to manage multiple provider APIs, comparing model performance, and dealing with varying reliability. Inference Providers solves these challenges by offering:

      • Cerebras
      • Cohere
      • Fal AI
      • Featherless AI
      • Fireworks
      • Groq
      • HF Inference
      • Hyperbolic
      • Nebius
      • Novita
      • Nscale
      • OVHcloud AI Endpoints
      • Public AI
      • Replicate
      • SambaNova
      • Scaleway
      • Together
      • WaveSpeedAI
      • Z.ai

Instant Access to Cutting-Edge Models: Go beyond mainstream providers to access thousands of specialized models across multiple AI tasks. Whether you need the latest language models, state-of-the-art image generators, or domain-specific embeddings, you’ll find them here.

Zero Vendor Lock-in: Unlike being tied to a single provider’s model catalog, you get access to models from Cerebras, Groq, Together AI, Replicate, and more — all through one consistent interface.

Production-Ready Performance: Built for enterprise workloads with the reliability your applications demand.

Here’s what you can build:

Text Generation: Use Large language models with tool-calling capabilities for chatbots, content generation, and code assistance
Image and Video Generation: Create custom images and videos, including support for LoRAs and style customization
Search & Retrieval: State-of-the-art embeddings for semantic search, RAG systems, and recommendation engines
Traditional ML Tasks: Ready-to-use models for classification, NER, summarization, and speech recognition
⚡ Get Started for Free: Inference Providers includes a generous free tier, with additional credits for PRO users and Enterprise Hub organizations.

Key Features
🎯 All-in-One API: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more.
🔀 Multi-Provider Support: Easily run models from top-tier providers like fal, Replicate, Sambanova, Together AI, and others.
🚀 Scalable & Reliable: Built for high availability and low-latency performance in production environments.
🔧 Developer-Friendly: Simple requests, fast responses, and a consistent developer experience across Python and JavaScript clients.
👷 Easy to integrate: Drop-in replacement for the OpenAI chat completions API.
💰 Cost-Effective: No extra markup on provider rates.
Getting Started
Inference Providers works with your existing development workflow. Whether you prefer Python, JavaScript, or direct HTTP calls, we provide native SDKs and OpenAI-compatible APIs to get you up and running quickly.

We’ll walk through a practical example using openai/gpt-oss-120b, a state-of-the-art open-weights conversational model.

Inference Playground
Before diving into integration, explore models interactively with our Inference Playground. Test different chat completion models with your prompts and compare responses to find the perfect fit for your use case.

Join Hugging Space today!

Compiled List of LLMs

As of late 2025, the landscape of Large Language Models (LLMs) includes a diverse range of proprietary and open-source systems. Below is a comprehensive list categorized by their respective developers:
Proprietary Models
These models are typically accessed via web interfaces or paid APIs.
    • OpenAI: GPT-5 (released August 2025), GPT-4.5, GPT-o1, and GPT-o3.
    • Anthropic: Claude 4.1 (released August 2025), Claude 3.7 Sonnet, and Claude 3.5 Sonnet.
    • Google DeepMind: Gemini 2.5 Pro, Gemini 2.0 Flash, and PaLM 2.
    • xAI: Grok-5 (released August 2025) and Grok-3.
    • Inflection AI: Inflection-2.5.
Open-Source & Weights-Available Models
These models are often available for download and local hosting via platforms like Hugging Face.
    • Meta: Llama 4 Scout (released April 2025), Llama 3.1 (405B), and Llama 3.
    • DeepSeek: DeepSeek-R1 (671B), DeepSeek-V3, and DeepSeek-V2.5.
    • Mistral AI: Mistral Large 2, Mixtral 8x22B, and Mistral 7B.
    • Alibaba Cloud: Qwen 3, Qwen 2.5-Max, and Qwen 1.5.
    • TII (Technology Innovation Institute): Falcon 180B and Falcon 40B.
    • NVIDIA: Nemotron-4 (340B).
    • Microsoft: Phi-3 (Mini, Small, Medium).
    • Cohere: Command R+ and Command R. 
Other Notable Models
    • Gemma: Open models by Google based on Gemini technology.
    • Jamba: A hybrid Mamba-Transformer model from AI21 Labs.
    • DBRX: A fine-grained mixture-of-experts model by Databricks.
    • Stable LM: Developed by Stability AI.