Skip to main content
What is Overfitting?
  1. Glossary/

What is Overfitting?

7 mins·
Ben Schmidt
Author
I am going to help you build the impossible.

You spend months analyzing customer feedback. You look at every single complaint and every single feature request. You look at the usage data from your beta users. You decide to build a product that addresses every single one of those points perfectly.

On paper, this sounds like good business. You are listening to the customer. You are being data-driven.

But in reality, you might be walking into a trap known as overfitting.

In the world of data science and machine learning, overfitting is a modeling error. It occurs when a function is too closely fit to a limited set of data points. The model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

Essentially, the model memorizes the data rather than learning the general rules.

For a startup founder, the definition is almost exactly the same, but the consequences are played out in strategy and product development rather than code. It is the act of optimizing your business for the specific quirks of your current situation rather than the general realities of the market you want to conquer.

The Difference Between Signal and Noise

#

When you are starting out, data is scarce. You might have five customers. Or maybe you have fifty. In the grand scheme of a scalable business, these are small numbers.

Every piece of feedback feels massive because the sample size is so small. If one customer out of five hates the color blue, it feels like 20 percent of your market hates the color blue. If you change your entire brand palette based on that one person, you are overfitting.

You are treating noise as if it were signal.

Signal represents the underlying truth of the market. It is the general trend that applies to the majority of your future customers. Noise is the random variance found in any data set. It is the specific preference of a single user that does not apply to anyone else.

When you overfit, you build a product that is perfect for the past but fragile in the future. You create a solution so specific to the test group that it fails completely when you try to sell it to the general public.

A model that is overfitted looks impressive at first glance. It has 100 percent accuracy on the data it has seen. But it has zero predictive power for the data it has not seen yet.

Overfitting vs. Underfitting

#

To understand this concept better, it helps to look at the opposite problem. This is called underfitting.

Underfitting happens when you ignore the data too much. You assume a straight line where there should be a curve. In a startup context, this looks like a founder who ignores customer feedback entirely because they believe they have the Steve Jobs-level vision.

They build a generic solution that solves no one’s problem specifically enough to matter.

This is a spectrum.

On the far left, you have underfitting. The product is too vague. It is a Swiss Army Knife that is dull on every edge. It tries to please everyone and pleases no one.

On the far right, you have overfitting. The product is a key that fits only one specific lock. It is incredible for the customer you already have, but useless for the customer you are trying to acquire next.

Your goal is to find the middle ground. This is often called the Goldilocks zone in modeling or simply “generalization” in statistics. You want a model, or a business strategy, that captures the main trends without getting confused by the random fluctuations.

You want to build for the rule, not the exception.

Signs You Are Overfitting Your Startup

#

It is easy to define this abstractly. It is much harder to spot when you are in the trenches. The pressure to please early customers is immense. You want revenue. You want retention. It feels safer to say yes to a feature request than to say no.

Don’t build custom software for one.
Don’t build custom software for one.
Here are a few scenarios where overfitting commonly occurs in early-stage companies.

The “Whale” Client Trap

You land a massive client early on. They are paying you more than everyone else combined. They ask for a specific reporting feature that only applies to their internal workflow. You build it. Then they ask for a custom integration. You build that too.

Six months later, you realize you have not built a SaaS product. You have built custom software for one company. You overfitted your product roadmap to a single data point. When you try to sell to the next whale, they have a different workflow, and your product is now rigid and bloated with features that only one entity uses.

The Pivot Based on a Bad Week

You run a marketing campaign. It flops. You get zero leads for three days. You panic and decide that your messaging is wrong, your pricing is wrong, and your target audience doesn’t exist.

You completely overhaul your landing page based on three days of silence.

This is overfitting to a short time horizon. Business is noisy. Sometimes you just have a bad week. If you change your strategy every time the data dips, you are chasing noise. You are not letting the model run long enough to see the actual trend line.

The Feature Factory

You have a backlog of 200 features because every time a user asks for something, you add it to the list. You believe that a better product is one that has more features.

The result is a complex, unusable mess. Instead of solving the core problem elegantly, you have patched together a solution for every edge case you have ever encountered. New users sign up and are overwhelmed. They leave because the learning curve is too steep.

You overfitted for the power users at the expense of the new users.

How to Avoid the Trap

#

The solution to overfitting in data science is often cross-validation. You train the model on one set of data, and then you test it on a completely different set of data that the model has never seen before.

In business, you need to do the same thing.

Validate with Cohorts

When you get feedback that suggests a major change, do not implement it for everyone immediately. Test it on a new group of customers. Does this feature request from your first ten customers actually help the next ten customers? Or does it confuse them?

If the new cohort doesn’t value the change, you were likely overfitting to the first group.

Look for Patterns, Not Anecdotes

Stop reacting to single data points. Wait until you hear the same thing three, five, or ten times. A pattern is a signal. A single angry email is an anecdote.

It requires discipline to sit on your hands when a customer is shouting. But you have to ask yourself if fixing this one person’s problem breaks the system for the hundred people who are currently happy.

Simplify Your Model

In statistics, complex models are more prone to overfitting. Simple models are more robust. The same is true for business models and products.

Complexity is the enemy of scale. If your sales process requires twenty exceptions to close a deal, you have overfitted your sales process. If your product requires a manual onboarding call to explain the interface, you have overfitted your design.

Aim for simplicity. A simple solution that works for 80 percent of the market is infinitely more valuable than a complex solution that works perfectly for 10 percent.

Building a company is about pattern recognition. You are looking for the universal truths in your market. You will never find them if you are too busy obsessing over the microscopic details of your current dataset. Step back. look at the trend line. Build for the future, not just for the data you have today.