Fractile is Building the Next Generation of Inference Hardware

Karan Bhatia
5 hours ago
2 min read

Fractile, the AI inference hardware company, led by Walter Goodwin and the team, has raised $220M to accelerate the path to getting its first chips and systems into customers’ hands, in a financing round led by Accel, Factorial Funds, and Founders Fund, with participation from Conviction, Gigascale, O1A, Felicis, Buckley Ventures, and 8VC, investing alongside its brilliant existing backers.

Fractile, founded in 2022, was built on the belief that frontier AI is ultimately constrained by inference speed. The company’s focus has been on rethinking hardware to reduce time from query to output and unlock the full value of advanced models.

As AI systems improve, longer reasoning chains and multi-million-token outputs are becoming more important, making latency the key bottleneck to capability. At the same time, inference economics have become a major constraint, both the core revenue driver of AI and the limiting factor on its scale.

Frontier AI performance has long scaled with inference-time compute. Systems like DeepMind’s AlphaGo showed this through repeated neural network evaluations in tree search rather than single-step predictions.

With reasoning models, the same pattern now applies to LLMs: higher-value use cases increasingly rely on long, multi-step generation, where complex tasks require extended sequential computation.

Hard problems often require long sequences of intermediate steps that only become valuable when later synthesized. Andrew Wiles’ work on Fermat’s Last Theorem illustrates this: years of exploration produced extensive dead ends and partial insights that ultimately connected into a proof.

Similarly, frontier LLMs are increasingly being pushed toward long-context, multi-step reasoning, generating large bodies of intermediate work to solve complex problems that require sustained exploration and synthesis.

Today’s LLMs can already generate outputs approaching 100 million tokens for hard problems, but at ~40 tokens per second, this can take weeks to complete. Inference speed and memory bandwidth are now the key constraints limiting progress.

Compressing that timeline to a day requires ~1,200 tokens per second while sustaining long-context reasoning at scale. Fractile has been built specifically to address this bottleneck from first principles.

The real impact of this hardware shift is not just faster existing workloads, but entirely new ones. Compressing long computation cycles makes far more ambitious AI use cases economically viable.

From drug discovery to software and materials science, extended inference will enable large-scale intellectual exploration across fields. Those who push this frontier will define the next wave of value creation. Fractile’s aims to increase the clock speed of global progress, one chip at a time.

Making this possible starts with people. Since founding, work has spanned the full stack, from AI research to foundry processes to chip micro-architecture, to pursue approaches that break the cost–latency trade-off and push beyond the current inference frontier, enabling new capabilities for frontier AI systems.

MENLO TIMES

Fractile is Building the Next Generation of Inference Hardware

Recent Posts