Two years ago, I wrote The Experimentation Gap, reflecting on the way that democratized, advanced statistical decision making was becoming a mission critical advantage for many tech startups, yet also something that was increasingly out of reach for the vast majority of companies.
This article was a reflection of my time at Google seeing the dichotomy between one of the best experimentation platforms in the world in the Search organization, and then a completely manual experimentation process in the Android/Pixel team.
It was also a reflection of the immense pain I had heard from so many product and growth teams regarding experimentation in the preceding years — they couldn't run enough experiments, they couldn’t evangelize experimentation within their company, they couldn’t rely on experimental analysis or results, and they were limited by simple A/B testing methodologies which could not be applied to more complex situations (e.g. situations where you can’t perfectly randomize treatments, situations where there are different optimal treatments for different types of users).
Many of these issues stemmed from the numerous ways experimentation was changing. A/B testing had transitioned from marketers changing copy on landing pages to product, growth, and data science teams optimizing everything from activations funnels to ML models. Centralization around data warehouses meant that it no longer made sense to have different experimental tools for each function disconnected from each other. And the rise of AI meant that statistical evaluation of models was increasingly table-stakes for basically every company at scale.
It was clear to me at the time that there was an immense opportunity for a new breed of experimentation platform that solved these challenges by taking the learnings from the most advanced experimentation cultures like Airbnb and Stitch Fix and building them into a product that democratized statistical decision making for modern organizations.
And so, after tracking this space for well over four years, I couldn’t be more excited to announce our lead Series B investment in Eppo, which I believe is in the best position to actually solve the experimentation gap.
Eppo is used by companies like Twitch, Zapier, Miro, DraftKings, Perplexity, and Descript to evaluate and optimize everything from product launches to lifecycle marketing campaigns to large language models. The product is not only easy to use, which is critical for democratizing experimentation across an organization, but has also consistently been at the technical forefront of the space. It was the first experimentation platform to be data-warehouse native, the first to integrate advanced variance reduction techniques, the first to launch contextual bandits, and so much more.
I first met Che, Eppo’s CEO and founder, when he was building out the experimentation platform at Webflow after having formerly built out much of Airbnb’s core data platform. Che founded Eppo to build the product he wished existed back when he was at Webflow and Airbnb, and it has been such a pleasure to watch him execute almost flawlessly over the last four years.
Che approaches company building with the rigorous, methodological approach of a data scientist. The result is a product truly built from first principles to enable modern companies to win via rigorous, fast-paced decision-making, and a team that is undoubtedly the strongest statistics engineering team in the world. The market has started to recognize this - with many companies that had built extensive in-house experimentation platforms now migrating fully to Eppo.
What excites me most about Eppo, though, is what’s to come. Experimentation has traditionally been a highly fragmented market, with completely different tools used for performance marketing experimentation (“channel incrementality”), offline/online experiments (e.g. APT), ML experiments, personalization, and product/growth experiments.
Yet, at their core, these systems are all doing the same things, and so much of the value to be gained comes from unifying these tests end to end - such as connecting your advertising experiments all the way through product activation. The opportunity for Eppo is to be the central statistics engine that powers your entire organization, bringing experimental rigor into as many decisions as possible.
The importance of this unified statistical engine will go up tremendously as all products that we interact with become AI-enabled, probabilistic systems. Evaluation is the hardest part of building AI products, and one of the only reliable ways to evaluate such systems at scale is to experiment. As a result, tools like Eppo will likely become an essential component of all AI-native companies moving forward.
This investment in Eppo reflects our longstanding interest in core computing infrastructure and applications of advanced mathematics, statistics, and machine learning. We couldn’t be more excited to partner with Che as he builds the core decision-making engine for enterprises.