top of page

DeepInfra Raises $107 Million Series B to Scale Inference Infrastructure

  • Writer: Karan Bhatia
    Karan Bhatia
  • May 5
  • 2 min read

DeepInfra, helping you accelerate your AI with developer-friendly APIs designed for performance and cost-efficiency, led by Nikola Borisov, Georgios Papoutsis, and Yessenzhar Kanapin, has raised $107 million in Series B funding to scale DeepInfra's inference cloud and expand its global capacity, co-led by 500 Global and Georges Harik, with participation from A.Capital Ventures, Crescent Cove, Felicis, NVIDIA, Peak6, Samsung Next, Supermicro, and Upper90.


Inference is becoming the primary bottleneck in enterprise AI. At DeepInfra, the view is that inference, not training, now drives most workloads, as open-source models approach parity with proprietary systems and agent-based applications generate continuous, high-volume token demand.


This shift turns inference into the core system constraint, exposing limits in traditional cloud infrastructure and driving the need for systems purpose-built for inference efficiency, performance, and security.


Inference requires its own stack. At DeepInfra, inference is treated as a full-stack challenge, requiring specialized hardware, purpose-built networking, and software optimized for high-throughput, low-latency workloads.


General-purpose cloud infrastructure, designed for mixed and bursty workloads, struggles with continuous token generation, leaving performance and cost inefficiencies. DeepInfra addresses this by co-designing across all layers to optimize for the demands of agentic AI workloads.


DeepInfra is built as a vertically integrated inference platform, designed specifically for high-volume, low-latency AI workloads. It operates its own GPU infrastructure across multiple U.S. data centers, enabling tighter control over cost and performance than general-purpose cloud providers.


Optimized for the agentic era, it focuses on continuous token generation as the default workload. In collaboration with NVIDIA, it supports advanced inference stacks and next-generation GPUs to improve efficiency, while offering enterprise-grade security, compliance, and API compatibility for production use.


What’s next for DeepInfra


The funding will accelerate the expansion of global compute capacity, improvement of developer tooling, and support for next-generation open-source and agentic models.


The focus remains on enabling production-grade inference at scale, as it becomes a key driver of enterprise AI deployment.

Menlo Times is a global media platform covering AI, Deeptech, Venture Capital, Fintech, Robotics, and Security through news, analysis, and insights from founders and operators.
  • Instagram
  • Facebook
  • X(Formerly Twitter)
  • LinkedIn
  • YouTube
© 2026 Menlo Times. All rights reserved.
bottom of page