AI - It's All Downhill From Here?

Written by Mike James

Wednesday, 31 December 2025

Article Index
AI - It's All Downhill From Here?
Variety and Generalization

Page 2 of 2

Variety and Generalization

This idea is related to an older and even more general principle:

The principle of requisite variety, also known as Ashby's Law or the First Law of Cybernetics, states that if you are trying to control a system then the control mechanism has to have as much variety as the system.

In more modern terms, Ashby's law can be stated in terms of degrees of freedom or dimensions of variation. This roughly corresponds to the idea of how many ways something can vary. This is a subtle idea. For example, if you are considering a color, how many degrees of freedom, dimensions or variety does this phenomenon have? It looks as if there are an almost infinite number of colors, but in fact you can make any color from just three primary colors. Thus the degrees of freedom, dimensionality or variety of color is exactly three and not infinity.

What Ashby's law tells us about color is that, if you want to control it, you need at least three knobs. You don't need four or more, but there might be good reasons for over-providing. What it most definitely tells you is that you can forget any idea of controlling color with fewer than three control knobs.

So the story so far is that we have gradient descent as an efficient way of optimizing the performance of any system and we have the idea that the system should have the requisite variety for the task in hand.

These two ideas are the core of modern AI and for much of the way that the world works.

When we set a neural network to learn language we rely on the fact that it has enough variety, dimensions or degrees of freedom to model the language and it is trained using gradient descent.

And it works.

However, thing are just a little more subtle than Ashby's law suggests. Suppose you have a neural network, or any differentiable black box, and you want to train it to produce any color by showing it different colors. If the system has many more than three degrees of freedom than it will learn to produce colors, but it might well miss the fact that three controls are enough. It might do the job, but it might not find a natural solution. If, however, the system is constrained to only have three degrees of freedom, the solution has to find is the right one - the one that corresponds to the internal structure of the data we are trying to control.

So, if you have a differentiable black box with three degrees of freedom, gradient descent will, most likely, give you a model which matches the physical reality of three primary colors. Getting the degrees of freedom right means that the black box not only learns to control the colors, but it also learns a model which corresponds to the structure of the problem.

This is also key to generalization. Suppose we only train the box on a small portion of the color spectrum. If it manages to learn that any color it has been shown can be made from three primary colors, then it will cope correctly when shown colors it was never shown in the training set. In other words, what it learns generalizes because it hasn't just memorized how to create the colors it was shown, but because it has learned the deeper model of color generation.

Of course, real data hides the number of degrees of freedom it has - it isn't at all obvious that the infinite variability of color hides just three degrees of freedom. The actual dimensionality of a dataset is usually called its "latent dimension" or "latent structure" and it is a very important idea.

We can now extend Ashby's law a little:

"If you are trying to control a system then the control mechanism has to have exactly the variety of the system if it is to accurately represent it."

So it is with neural networks learning other things - language for example. In practice, it isn't essential to get the number of degrees of freedom exactly right. Simply trying to squeeze the representation into a smaller number of dimensions than it appears to have, forces the representation to be closer to the underlying structure of the data. What this means for neural networks is that we shouldn't throw too many neurons at a problem or we simply end up with rote learning rather than deep learning. Of course, the number of degrees of freedom in a language model, say, is well beyond the number of neurons we can practically throw at it.

It is the bottleneck of restricted degrees of freedom in the representation that forces us to find good theories of the behavior.

Yes, it's a practical form of Occam's Razor.

So as far as AI goes, and many other physical phenomena, it is downhill all the way with a side order of just enough variety.

ecoliicon2021

How Can You Not Be Impressed By AI?

Missing The Point Of LLMs

Gradient Descent Via E.Coli

If You Sleep Well Tonight You May Not Have Understood

Hinton, LeCun and Bengio Receive 2018 Turing Award

The AI Scam

Why Deep Networks Are Better

The Unreasonable Effectiveness Of GPT-3

It Matters What Language AI Thinks In

The Triumph Of Deep Learning

Training A Cellular Automaton

Neurons Are Smarter Than We Thought

Neurons Are Two-Layer Networks

Evolution Is Better Than Backprop?

Why Deep Networks Are Better

DeepCoder Learns To Write Programs.

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Facebook or Linkedin.

Getting Ready For Santa
24/12/2025

The annual Santa tech-fest is well under way, with Santa trackers from both Google and NORAD counting down to the big day to see how Santa is doing on sorting out who's been naughty and who's been nic [ ... ]

+ Full Story

The Goose Advent Of AI Has Commenced
10/12/2025

A new Advent calendar had joined those for Java, Kotlin and Rust. We now have Advent of AI, a series of AI engineering challenges from Goose, that is already underway.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

<< Prev - Next

Last Updated ( Wednesday, 31 December 2025 )

Variety and Generalization

Related Articles

Comments