How Protege is Unlocking Access to Data for AI Development
- Karan Bhatia

- Jan 9
- 2 min read

Protege, building the data layer for AI training, led by Bobby Samuels, Travis May, Engy Ziedan, and Richard Ho, has announced a $30 million Series A round led by Andreessen Horowitz (a16z). The financing expands the company’s $25 million Series A from August 2025 and brings total funding to $65 million since the company’s founding in 2024. Returning investors include Footwork, CRV, Bloomberg Beta, Flex Capital, Shaper Capital, and more.
Bobby Samuels, CEO and co-founder of Protege, said the demand for real-world data is outpacing the market’s ability to supply it responsibly. With data highly fragmented and difficult to operationalize at scale, Protege serves as a trusted source of curated, AI-ready datasets while unlocking new revenue for data providers. Backing from Andreessen Horowitz will help expand this model and deliver high-quality, use-case-specific data to AI research teams.
Protege streamlines access to private and proprietary real-world datasets across formats, including media content, audio, de-identified health records, and medical imaging. The company aggregates and licenses data from trusted partners, then curates and optimizes it for AI training and evaluation. Protege now supports institutions and AI companies worldwide, including most of the “Magnificent Seven.”
Travis May, Chairman and co-founder, noted that access to data is the biggest bottleneck in AI progress. The next wave of AI will rely on proprietary data generated through everyday human activity, and Protege is creating safe, compensated pathways for accessing it.
In 2025, Protege grew its partner network to hundreds of organizations, enabling aggregated access to new data sources and formats. Partners earn revenue shares with each dataset use.
Daisy Wolf, Partner at Andreessen Horowitz, emphasized that the next era of AI will be shaped by those who can responsibly unlock real-world data. Protege’s platform respects data complexity while making it usable for modern AI development.
The new funding will accelerate product development, expand Protege’s data network into new domains, deepen institutional partnerships, and scale the infrastructure required to deliver rights-protected, AI-ready real-world data.


