What are Model Weights?

Table of Contents

You hear the term thrown around constantly in technical circles and venture capital meetings. Everyone talks about the model. They talk about the architecture. But the real value and the real cost usually lies in something slightly more obscure.

Model weights.

If you are building a startup that leverages artificial intelligence or machine learning, you cannot treat the underlying technology as a total black box. You do not need to code the back propagation algorithm by hand. You do need to understand what you are actually owning, buying, or renting.

When you download an open source model or pay for an API, you are interacting with the weights. They are the difference between a blank slate and a useful tool.

Understanding weights helps you estimate costs. It helps you understand why training a model is so expensive compared to running one. It clarifies why data is considered the new oil.

Data is the fuel. The weights are the engine that the fuel built.

Defining Model Weights

At its core, a neural network is a collection of neurons arranged in layers. These neurons are connected to one another. It mimics the structure of a biological brain, though in a much more simplified mathematical format.

Model weights are the specific numbers that determine the strength of the connection between these neurons.

Think of a neural network as a massive switchboard. Data comes in one side. It passes through thousands or billions of connections. It comes out the other side as a prediction or a generation of text.

The weights determine how much influence one piece of data has on the next layer of the network. They are the learnable parameters.

When a model is first initialized, the weights are usually random numbers. The model knows nothing. If you put an image of a cat into an untrained model, the output will be noise.

Training is the process of adjusting these weights.

The model looks at data. It makes a guess. It checks if the guess is right. Then it goes back and tweaks the weights slightly to reduce the error. This happens billions of times.

Eventually, those random numbers settle into a specific configuration. That configuration represents the knowledge of the model. That collection of numbers is the model weights.

Architecture Versus Weights

It is easy to confuse the model architecture with the model weights. They are distinct concepts and this distinction matters for business strategy.

The architecture is the design. It is the code that defines how the layers are arranged. It defines the rules of how data flows. Examples of architectures include Transformers, RNNs, or CNNs.

The weights are the state of that architecture after training.

Imagine a textbook. The architecture is the structure of the book. It is the chapters, the page numbers, the grammar rules, and the empty lines waiting to be filled.

The weights are the actual words written on the pages.

If you possess the architecture but not the weights, you have an empty shell. You have the potential for intelligence but no actual capability. This is why you can find the code for many famous model architectures on GitHub, but you cannot simply run them and get ChatGPT level performance without the weights.

The weights are what companies spend millions of dollars on computing power to create. They are the asset.

The Cost of Creation

This leads to the economics of model weights. Why are they so valuable? Because creating them requires massive resources.

To settle on the correct weights for a large language model, you have to process trillions of tokens of text. You need thousands of GPUs running for weeks or months. This consumes electricity and requires massive engineering oversight.

This process is called pre-training.

The resulting file is simply a list of numbers. It might be a few gigabytes or a few terabytes in size. But that file represents millions of dollars of compute time.

This creates a specific dynamic in the startup ecosystem. Very few companies can afford to create high performance weights from scratch. Most startups will fall into one of two categories.

They will rent access to weights hosted by a provider like OpenAI or Anthropic. Or they will use open weights models like Llama, where a large company has done the heavy lifting and released the parameters for public use.

This decision impacts your margins and your control. If you rent, you have no control over the weights. If the provider changes them, your product changes. If you use open weights, you have to host them yourself, which brings its own operational complexities.

Fine Tuning and Specialization

There is a middle ground between renting and building from scratch. This is where the concept of fine-tuning comes in.

Fine-tuning is the process of taking a model that already has good weights and adjusting them slightly for a specific task.

Imagine you have a general purpose employee who is smart but knows nothing about your specific industry. You do not send them back to kindergarten. You put them through a two week training program.

In machine learning, you take the pre-trained weights and continue the training process on a much smaller, specific dataset. You are not randomizing the weights and starting over. You are nudging the existing weights to favor your specific terminology or format.

This is much cheaper than pre-training. It allows a startup to take a generic model and turn it into a specialist.

The strategic question here is about data quality. Fine-tuning only works if you have a dataset that is high quality enough to improve the weights. If your data is messy, you might actually degrade the performance of the model. This is often called catastrophic forgetting, where the model learns new things but forgets the old general knowledge.

The Storage and Latency Trade-off

From an operational standpoint, weights act as a constraint on speed and cost.

The more parameters a model has, the more weights there are to store and process. A 70 billion parameter model has 70 billion weights. Every time you ask the model a question, the computer has to perform calculations using all those weights.

This requires memory (VRAM). If the weights do not fit on the graphics card, the model cannot run fast enough for real-time applications.

This introduces a trade-off. Larger models with more weights are generally smarter and more capable. They capture more nuance. But they are slower and more expensive to run.

Founders often default to wanting the biggest model. However, for many specific business tasks, a smaller model with fewer weights might be sufficient. If you are just classifying customer support tickets, you do not need a model that knows how to write poetry in French.

Techniques like quantization exist to help with this. Quantization reduces the precision of the weights. Instead of storing a weight as a very precise decimal with 16 decimal places, you might store it with only 4. You lose a tiny bit of intelligence, but you make the model much smaller and faster.

Intellectual Property and Moats

Are model weights a defensive moat? This is a question the industry is still figuring out.

If you train your own model or fine-tune one heavily, those weights are proprietary. They contain the distilled essence of your data. If your data is unique, your weights are unique.

However, weights are just files. They can be stolen or leaked. Unlike a SaaS platform where the code is hidden behind a server, if you distribute your model to run on a user’s device, they technically have access to your weights.

Furthermore, the shelf life of weights is short. The state of the art moves fast. A model that was cutting edge six months ago might be obsolete today because a better architecture or better training method was discovered.

Founders should view weights as a snapshot of value. They are an asset, but a depreciating one. The real long term value lies in the pipeline that creates the weights—the proprietary data acquisition and the evaluation systems that tell you if the weights are actually good.

When you are building, ask yourself what you are actually relying on. Are you relying on the generic intelligence provided by someone else’s weights? Or are you building a system to cultivate your own custom weights that understand your customers better than anyone else?

The answer to that defines your infrastructure costs, your team composition, and ultimately your valuation.