top of page

Inference Cloud Gimlet Announces Series A Raise

  • Writer: Karan Bhatia
    Karan Bhatia
  • 1 hour ago
  • 1 min read

Gimlet Labs, an applied research lab dedicated to building next-generation AI infrastructure, led by Zain Asgar, Michelle Nguyen, Omid Azizi, Natalie Serrino, James Bartlett, Beltir Caglar-Dayanik, and the team, has announced an $80M Series A raise, led by Menlo Ventures and joined by Eclipse, Factory, Prosperity7, and Triatomic.


Gimlet is an inference cloud built to run AI agents. In the five months since launch, demand has surged, with the customer base tripling to include a leading frontier lab and a hyperscaler. The platform now runs proprietary frontier models, workloads rarely entrusted to inference providers.


Gimlet was founded to address AI inference as a defining infrastructure challenge, intensified by agentic workloads. As agents generate far more tokens and increasingly interact with each other, reducing latency has become critical in the ongoing Inference Speed Wars.


As demand for high-throughput, low-latency agent serving grows, existing homogeneous infrastructure is hitting limits. Inference workloads are inherently heterogeneous; different phases like prefill, decode, and attention have varying performance needs, some suited to GPUs and others to alternative architectures. Agents amplify this complexity, chaining multiple models, tools, code, and data retrieval, making inefficiencies compound across the entire workflow.


Instead of relying on homogeneous GPUs, Gimlet was designed around hardware heterogeneity, with a software stack that orchestrates complex agent workloads across diverse hardware and a new datacenter connecting these accelerators. This approach delivers 3–10× speedups on >1T-parameter frontier models within the same power envelope.


Building this infrastructure requires solving novel challenges, from connecting incompatible hardware to managing mixed thermal profiles, effectively reinventing the datacenter. As agents grow in complexity, the demand for heterogeneous infrastructure will only increase, enabling faster and more capable AI workflows.

bottom of page