In the world of paleoclimatology, scientists face a significant hurdle. They want to understand the climate of Earth from thousands or even millions of years ago, but thermometers and weather stations have only existed for a tiny fraction of that time. To solve this, they use proxy data. This includes natural recorders of climate variability like tree rings, ice cores, and fossil pollen. These elements are not the temperature itself, but they respond to temperature in predictable ways. By studying the thickness of a tree ring or the chemical composition of an ice bubble, scientists can reconstruct a picture of the past.
As a founder, you are often in a similar position. You are building something that has never existed before. You do not have ten years of customer retention data. You do not have a massive database of historical pricing sensitivity. You are operating in a vacuum of direct information. This is where the concept of proxy data becomes an essential tool for the entrepreneur. In a business context, proxy data consists of indirect measurements that stand in for the actual data you wish you had but cannot yet access.
Understanding the Role of Proxies in Startups
#Startups live and die by their ability to make decisions under uncertainty. If you wait until you have perfect, direct data to make a move, you will likely be outpaced by a competitor or run out of capital. Proxy data provides a bridge. It allows you to gather evidence that a particular direction is correct before the final results are visible.
Consider the challenge of measuring long term customer loyalty in a company that is only six months old. You cannot measure five year churn because your company has not existed for five years. Instead, you look for proxies. You might look at the frequency of logins within the first thirty days. You might track how many users recommend the product to a friend. These are not direct measurements of five year retention, but they are indicators that suggest whether a user is finding value.
In science, a proxy must be calibrated. Scientists compare modern tree rings with modern thermometer readings to ensure the relationship is reliable. In business, you must do the same. You have to ask if the proxy you are tracking actually correlates with the outcome you want. If people are logging in every day but not actually performing the core task of your software, the login frequency might be a false proxy.
- Proxy data is an indirect measurement.
- It replaces data that is currently impossible to collect.
- It requires a logical link between the proxy and the target metric.
- It helps founders move faster by providing early signals.
The Difference Between Proxy and Direct Data
#Direct data is the gold standard. It is the actual measurement of the thing you care about. If you want to know how much revenue you made yesterday, you look at your bank account and your accounting software. That is direct data. It is factual, historical, and requires very little interpretation.
Proxy data is different because it requires an inferential leap. You are measuring X to understand Y. For example, a startup might use the number of people who join a waitlist as a proxy for market demand. The direct data would be the number of people who actually put in their credit card details and buy the product. Since the product is not finished yet, the waitlist is the best available substitute.
Direct data is usually a lagging indicator. It tells you what happened in the past. Proxy data is often a leading indicator. It gives you a hint about what might happen in the future. The risk with direct data is that by the time you have enough of it, it might be too late to change course. The risk with proxy data is that your inference might be wrong. You might have ten thousand people on a waitlist, but if the product price is too high, none of them will convert. In that case, the proxy was a poor predictor of the direct metric.
Scenarios Where Proxy Data is Essential
#One of the most common scenarios for using proxy data is during the fundraising process. Investors are looking for evidence of what they call traction. If a company is pre-revenue, there is no direct data on financial success. The founder must present proxies instead. These might include the growth rate of a free user base, the level of engagement in an online community, or the results of a pilot program with a reputable brand. Each of these serves as a stand-in for future revenue potential.
Another scenario involves product development. When testing a new feature, you might not be able to measure its impact on the company bottom line for several months. Instead, you look at proxy metrics like time on page, click through rates on a specific button, or the volume of support tickets related to that feature. These short term signals help the product team decide whether to iterate, pivot, or double down.
Hiring is also a field dominated by proxies. When you interview a candidate, you cannot observe their actual work performance over the next year. You use proxies like their previous experience, their performance on a technical test, and how they answer behavioral questions. You are gathering data points to predict a future outcome that is currently unknowable.
The Unknowns and Scientific Risks
#We must remain skeptical of our proxies. A scientific approach requires us to look for ways our data might be lying to us. One major unknown is the decay of a proxy. A metric that worked as a great proxy during your first year might become useless as you scale. For example, early adopters are often more engaged than the general public. If you use the engagement of your first one hundred users as a proxy for how the next ten thousand will behave, you might be making a grave error.
There is also the problem of Campbell’s Law. This states that once a metric becomes a target, it loses its value as an indicator. If you tell your marketing team that email signups are the primary proxy for success, they might find ways to get signups from people who have no intention of ever using the product. The proxy has been gamed, and the link between the proxy and the actual goal is severed.
We should constantly ask these questions:
- What is the specific mechanical link between this proxy and my goal?
- Are there external factors that could change this relationship?
- Am I tracking this because it is easy to measure or because it is meaningful?
- How will I know if this proxy is no longer accurate?
By treating business metrics with the same rigor that a paleoclimatologist treats an ice core, founders can build more resilient organizations. You do not need to have all the answers today. You just need to find the right proxies that point the way toward the truth of your market and your product.

