Overview

RunPod is a cloud infrastructure provider specializing in Graphics Processing Unit (GPU) compute resources, primarily targeting artificial intelligence (AI) and machine learning (ML) workloads. Founded in 2021, the platform is designed for developers and technical buyers who require scalable and cost-effective access to GPUs for tasks such as training large language models (LLMs), running inference for AI applications, and deploying custom AI APIs.

The service offers two primary consumption models: on-demand GPU instances and a serverless GPU platform. On-demand instances provide dedicated virtual machines with configurable GPU types, allowing users to select specific NVIDIA GPUs like the A100 or H100, along with CPU, RAM, and storage allocations. This model is suited for long-running training jobs or persistent development environments. Users can launch instances with pre-configured templates, including popular ML frameworks, or deploy custom Docker images. Persistent storage options allow data to be maintained across instance sessions, a common requirement for iterative model development and dataset management.

The serverless GPU offering abstracts away infrastructure management, allowing users to run code on GPUs without provisioning or managing virtual machines. This model is designed for event-driven, burstable workloads such as inference requests or short-duration training tasks that benefit from per-second billing and automatic scaling. RunPod's serverless platform supports custom Docker images and provides API endpoints for triggering functions, making it suitable for integrating AI capabilities into web applications or microservices. The platform also includes an "AI Endpoints" feature, which allows users to deploy their models as managed API services, handling the underlying infrastructure, load balancing, and scaling.

RunPod positions itself as an alternative to hyperscale cloud providers for GPU compute, focusing on offering a range of NVIDIA GPUs at competitive price points. For instance, comparing the cost-effectiveness of various cloud GPU providers is a common focus for ML engineers, as detailed in analyses by publications like The New Stack, which often highlight the competitive pricing of specialized GPU clouds against general-purpose cloud offerings The New Stack on Cloud GPU Computing. The platform provides a comprehensive API and command-line interface (CLI) for programmatic control over resources, facilitating automation and integration into CI/CD pipelines. This includes functionalities for instance management, storage orchestration, and monitoring of GPU utilization.

Key features

  • On-Demand GPU Instances: Access to dedicated NVIDIA GPUs (e.g., A100, H100) with configurable CPU, RAM, and storage, billed hourly. Users can select from pre-built templates or deploy custom Docker containers for various ML frameworks and applications RunPod On-Demand Pods documentation.
  • Serverless GPUs: Execute GPU-intensive code without managing underlying infrastructure, with per-second billing and automatic scaling. Designed for inference, short training runs, and event-driven AI workloads RunPod Serverless FAQ.
  • AI Endpoints: Deploy trained AI models as managed API services, handling scaling, load balancing, and infrastructure management. This feature simplifies the deployment of production-ready AI applications RunPod AI Endpoints documentation.
  • Persistent Storage: Attach network storage volumes to GPU instances, allowing data and model checkpoints to persist across sessions and be shared between different instances. This is critical for data-intensive ML workflows RunPod Persistent Storage guide.
  • Custom Docker Support: Users can bring their own Docker images to both on-demand and serverless environments, providing flexibility for custom environments, libraries, and application dependencies RunPod Docker Images documentation.
  • API and CLI: Programmatic access to manage GPU instances, serverless functions, storage, and other platform resources, enabling automation and integration into existing workflows RunPod API Reference.
  • NVIDIA GPU Selection: Wide range of NVIDIA GPUs available, including high-end options suitable for training large-scale deep learning models, such as the NVIDIA A100 and H100 GPUs RunPod GPU Prices page.

Pricing

RunPod's pricing model is based on the consumption of GPU resources, with distinct structures for on-demand instances and serverless functions. On-demand GPUs are billed hourly based on the specific GPU type, CPU cores, RAM, and storage allocated. Serverless GPUs are priced per second of usage, reflecting the ephemeral nature of these workloads. The pricing details below are illustrative and were accurate as of May 8, 2026. For the most current pricing, refer to the official RunPod pricing page RunPod GPU Prices.

Service/Product Billing Model Example GPU (Hourly/Per-Second Rate) Notes
On-Demand GPU Instances Hourly NVIDIA A100 80GB: ~$2.39/hour Billed for active time. Includes configurable CPU, RAM, and storage. Price varies by GPU model and region.
Serverless GPUs Per-second NVIDIA A100 80GB: ~$0.00066/second Compute billed only when code is running. Price varies by GPU model and region. Includes cold start time.
Persistent Storage Monthly per GB ~$0.04/GB/month Network storage for data persistence across instances.
Networking Per GB data transfer Outbound Data Transfer: ~$0.01/GB Inbound data transfer typically free. Outbound charges apply.

Common integrations

  • PyTorch / TensorFlow: Directly supported through pre-built templates or custom Docker images, enabling training and inference for popular deep learning frameworks RunPod PyTorch Template.
  • Hugging Face Transformers: Applications using the Hugging Face library can be deployed on RunPod's GPU infrastructure for model fine-tuning and inference RunPod Templates Overview.
  • FastAPI / Flask: Deploy AI models as web services by running Python applications built with these frameworks on serverless or on-demand GPUs, accessible via custom API endpoints RunPod AI Endpoints guide.
  • Docker: Core to the platform, enabling users to package custom environments, dependencies, and applications for consistent deployment across instances RunPod Docker Images.
  • Kubernetes (via API/CLI): While not a managed Kubernetes service directly, the API and CLI can be integrated into existing Kubernetes workflows to provision GPU resources for ML-specific pods externally RunPod API Reference.

Alternatives

  • Paperspace: Offers a range of GPU-accelerated cloud computing services, including Gradient for ML development and GPU VMs, often competing on price and developer experience Paperspace website.
  • Lambda Labs: Specializes in GPU cloud services, selling both hardware and cloud access to NVIDIA GPUs, frequently cited for competitive pricing on high-end GPUs Lambda Labs website.
  • CoreWeave: Provides specialized cloud infrastructure built on NVIDIA GPUs, focusing on high-performance computing and AI workloads, known for its focus on large-scale enterprise AI deployments CoreWeave website.
  • Google Cloud (Vertex AI / GKE with GPUs): Offers enterprise-grade ML platforms and GPU-enabled Kubernetes clusters, providing a broader ecosystem but potentially higher costs for raw GPU compute Google Cloud Vertex AI.
  • AWS (SageMaker / EC2 with GPUs): Provides extensive managed ML services and a wide selection of GPU instances, suitable for large enterprises with existing AWS investments, but can be complex for pure GPU access AWS SageMaker.

Getting started

To get started with RunPod, you typically interact with their API or CLI to provision resources. The following example demonstrates how to launch an on-demand GPU instance using curl to interact with the RunPod API, requesting an NVIDIA A100 GPU. This example assumes you have an API key and sufficient credits in your RunPod account.

curl -X POST \ \
  https://api.runpod.io/v2/user/pods \ \
  -H 'Content-Type: application/json' \ \
  -H 'Authorization: Bearer YOUR_RUNPOD_API_KEY' \ \
  -d '{ \ \
    "cloudType": "SECURE", \ \
    "gpuType": "NVIDIA GeForce RTX 4090", \ \
    "containerDiskInGb": 20, \ \
    "minCpuCount": 4, \ \
    "minMemoryInGb": 32, \ \
    "gpuCount": 1, \ \
    "templateId": "runpod-pytorch-2-1-cuda-12-1", \ \
    "name": "my-first-gpu-pod", \ \
    "volumeInGb": 100 \ \
  }'

In this curl command:

  • YOUR_RUNPOD_API_KEY should be replaced with your actual RunPod API key, which can be generated from your account settings RunPod API documentation.
  • gpuType specifies the desired GPU model (e.g., "NVIDIA A100 80GB"). You can find available GPU types and their specifications on the RunPod pricing page.
  • templateId refers to a pre-configured Docker image template; runpod-pytorch-2-1-cuda-12-1 is an example of a PyTorch environment. You can also specify a custom imageUrl for your own Docker image.
  • containerDiskInGb sets the size of the ephemeral disk for the container.
  • minCpuCount and minMemoryInGb define the minimum CPU cores and RAM for the instance.
  • volumeInGb allocates persistent storage, which can be attached and detached from pods RunPod Persistent Storage.

After executing this command, the RunPod API will return a response containing details about the newly created pod, including its ID and status. You can then use the RunPod CLI or further API calls to connect to the pod via SSH, upload data, and start your GPU-accelerated workloads.