Exa Takes a Step Forward to Build the Fastest and Best Search API in the World, Introduces Exa 2.0
- Menlo Times
- 3 days ago
- 2 min read

Exa, the search engine built for AI, led by Will Bryk, Jeff Wang, and others, has introduced Exa 2.0, the next generation of search endpoints.
Specifically, Exa 2.0 includes three major updates:
Exa Fast: the fastest search API in the world at sub350ms
Exa Auto: the default search type is now much higher quality
Exa Deep: a new search type that agentically retrieves the highest quality results
Exa’s mission is to build the perfect search engine, one that delivers exactly the information needed as fast as physically possible via a seamless API. The launch of Exa 2.0 marks a major step toward this goal.
Exa 2.0 was built on a vastly expanded index, now crawling and parsing tens of billions of webpages refreshed every minute. A pretrained and fine-tuned embedding model enables precise semantic search over this index, trained for over a month on a 144x H200 cluster using new embedding techniques developed over the past six months.
Serving these embeddings at record-low latency required major updates to Exa’s in-house vector database, including new clustering algorithms, lexical compression, and assembly-level optimizations, all implemented in Rust.
The first API update, Exa Fast, delivers sub-350ms end-to-end P50 latency, 30% faster than the next fastest API. In these few hundred milliseconds, billions of webpages are processed to identify the most relevant results and extract the context most useful for AI consumption. Exa Fast enables near-instant search, allowing LLMs to access world knowledge without delay, and already powers hundreds of latency-sensitive AI applications, from financial agents to high-speed chatbots.
The second API update, Exa Deep, excels in search quality, topping nearly every benchmark. While slower at 3.5s P50, it builds on faster endpoints by agentically searching, processing, and re-searching to deliver the highest-quality information. Exa Deep is ideal for workloads requiring deep insights and complements Exa Auto, which balances latency and quality.
To evaluate search APIs in a RAG setting, the same LLM harness was used for each test, following the setup from Perplexity’s harness. All evaluations ran GPT-4.1 as the RAG model and GPT-4o-mini as the grader. For fairness, when public benchmarks with comparable settings existed, the higher score between measured results and public benchmarks was taken.
Comments