Tuesday, September 9, 2025

Why the singularity won't be singular

as I posted on techdirt.com

https://www.techdirt.com/2025/06/16/why-centralized-ai-is-not-our-inevitable-future/#comments 

Successful systems in the world around us are massively parallel, not a "Rambo" style super-entity.  supercomputers are actually societies of smaller computers whose net capacity depends more on the operating system and architecture than individual chip speed. 

I posted some other reasons and examples in my blog post of the inevitable result of ever grander AI entities, namely the ever grander Dunning-Kruger effect and ever greater vulnerability to a single point of failure ecologically. 

What GPT needs to do now is not learn everything -- it needs to learn how to ask for help and figure out how to assemble teams of whatever expertise is out there for whatever problem it is working on.  Then growth of individual AI agents is basically done, and, like human society,  the collectivity will now take over evolving. 

 Details follow

================== 

 The Library of Medicine observed back in 1990 or so that we don't need a $billion system - what we need is a billion $1 systems. If only every user could fix just one tiny part of the systems they use that is most clearly broken.

This matches the advice from The Toyota Way,  that many small steps will accomplish the growth you desire, even if it surprises you in the way it gets there.   

It's like the rooted organic growth that is sustainable occurs on the surface of a hypersphere, in the largest 1% shell,  in the domain where tensors are linear and what is "obvious" is most likely also true.

Larger steps are equally "obvious", but are in the non-linear domain and most likely wrong for reasons you cannot see from where you are.

The rise of  "supercomputers" masks the reality that, to accomplish great calculations,  use ten thousand or more smaller computers doing smaller calculations, working together.    The concept of a single huge CPU was abandoned long ago.

 Similarly in AI algorithms,  from 1970 to 1995 or so people tried to create ever larger ever more complex algorithms to do everything,  but finally realized that to accomplish AI what you needed instead was thousands of much smaller algorithms that worked cooperatively.

 I think the Zuckerberg model of a Renaissance AI - able to do everything,  is totally misguided. Now that AI can carry on a conversation,  it doesn't need to also be able to do long division -- there are other systems that do that.   No CEO would do long division in their head or on paper -- you turn to an expert ( eg a calculator ) or delegate to an assistant.    Similarly,   AI only needs one more skill - to be able to figure out which expertise would help it get its job done,  where to find it, and how to tap into it.

 In other words, the growth at this point should be outside the box, not inside the box.  There is no point in trying to reinvent every wheel!

 There is however a much more important problem with the "Rambo" solution to AI.  I cannot do the math but I am fairly certain that any single system is guaranteed to have blind spots that it is incapable of detecting internally.     The result is a sort of Dunning-Kruger syndrome of computing - an idiot-savant model that is very sure it is correct even when it is not.

If we look at the human visual system, there is no single super-cell or super-neuron which collects all the signals of texture and color and edges and shape and creates a percept -- it is a group effort.   "Rambo" class solutions are not used. 

The more of the world the model sucks up and makes subservient to and consistent with itself, the deeper it falls into the pit of blindness.   Like the ecological danger of mono-culture,  you don't want to plant all Chestnut trees, or Elm trees, because when the wrong thing comes along, they all die at once.

Another lesson can be learned from looking at radio telescopes, such as very Very Large Array in Arizona.

There is an absolute limit of resolution of a lens or dish antenna given by the wavelength divided by the diameter of the device.   Cleverly, however, the  device can only sparsely cover the middle ground and still get high resolution along certain axes.   By moving the dishes around, you can change which axes are seen the best.  So, in fact,  today Radio Astronomy uses virtual aperture synthetic dishes that are the size of the planet, with one part located, say, in Sweden and another in Peru. 

 But what is crucial is to maximize the distance between the edges.  In different words, in some sense, to maximize resolution and receiving capacity,  you want the maximum diversity possible.    In the very same sense, to maximize the capacity to generate novel insights, you want a team that has broadly diverse backgrounds and perspectives,  not 20 people who have the exact same background.

Put another way,  any single AI engine is essentially a monopole,  like an electron or a proton, capable only of receiving signals of a certain type.    A monopole won't react to dipolar radiation - for that you need a dipole antenna.   In the hardware tool category, a monopole is a hammer which can pound nicely, but is unable to rotate the simplest nut or screw.  For those you need torque.     The same limits apply if you want to receive gravity waves, which require a quadrapole antenna.   

Conceptually,  as Carl Sagan and Frank Drake pointed out in the SETI project, every time you open a new window in the electromagnetic spectrum,  you discover totally unexpected new phenomena, not just new sides of things you already knew.  Thus we now have infrared, ultraviolet, gamma-ray telescopes, etc.

But every one of those is still looking for dipole radiation.  There are an infinite number of higher rank antenna designs possible that no one has ever tried to create.

The point is that any superintelligent AI system is just a larger monopole, with much more collecting area but still totally blind to most of the universe.  You cannot get around that with scale.

Furthermore,  I suspect that Large Language Models suffer from a fatal defect or limitation -- they are based on language -- which is to say,  on processing serialized strings of symbols.  Very clever processing but still based on linear strings or linked lists, whatever you call them.

Linear strings look good when viewed as Turing machines, where infinite tapes of ones and zeros happily computer anything even if it takes almost forever and takes up almost the the entire universe of space and energy.

Reality, however is not a Turing machine. 

All tapes take energy to maintain and are prone to decay.  The error correcting problem grows too fast.

As anyone who struggled with calculus knows,  if you make a single mistake on an 8 page proof, everything after the error is completely wrong.

 A more powerful approach would be to use as the basis not linear strings, but images, especially now that we have graphics processing chips.  Images are much more powerful and you can take a picture of a dollar bill, change half the pixels to random black or white, and still recognize it.  

Parallel processing is vastly more powerful than serial processing. And "images" don't need to be restricted to two-dimensions,  as multispectral remote-sensing images are multidimensional.

The more components you put in, the more relationships there are to work with.  

Ultimately it is the relationships that hold the critical information that you are trying to capture and process, not the pixels individually.   

So we will almost certainly soon realize that "Large Image Models" are several orders of magnitude more powerful and robust and capable of rich concepts than "Large Language Models" -- regardless how large you make the box, the world outside the box will always trump the world that fits inside the box.

Some would say you could make a sequence of the pixels in a 2-D image, and therefore an image is identical to a string.  This in my mind is only true in the Turing fantasy world.  In a world in which everything takes time and energy,   one GPU cycle to process an image is much more helpful than 20 million pixel operations.

 =========


 

 

 

 

 

 

 

 

 

 

 

 

 

Saturday, August 24, 2024

There is no "best" way to look at it

 

I've started reading Doppleganger by Naomi Klein, and in a true irony the key point seems to be illustrated by the "Klein Bottle"


This is a variant of the Moebius strip 


and reminiscent of M.C. Escher's waterfalls.


And what is described in Escher Godel and Bach as 'strange loops'.
Another favorite example of which is intransitive dice, for which the
term "best" does not exist: whichever die you pick, and you go first,
I can always pick one of the remaining three that will beat your choice
2/3 of the time.

A beats B, B beats C, C beats D, and D beats A.  see wikipedia



 =========================

None of the standard Western education even exposes us to such things that cannot be laid flat, even
PhD level training in most fields does not help.

We seek "better and better" ways to grab hold of ( "grasp") such things, and keep failing, and assume that with just a little more work we will surely achieve our (impossible) goal.
 
In fact,  the whole familiar process of breaking something into parts and looking at each separately is exactly the wrong thing to do.  The problem is not with the parts, the parts are fine, the problem is the relationship between the parts.

The mountain will not come to Mohammed, we must go to it.
There is no "right way to look at it." that surely additional conversation and discourse will lead us to.

I suspect that all serial-string logic has this weakness, and "step-by-step" progress along the string remains open to such twisting.
 
I think we need to go to at least 2-dimensional, "image-processing" techniques, where there is a huge amount of cross-structure that prevent skew and twisting, like the diagonals that strengthen the very weak rectangular shape on bridges which otherwise could collapse sideways.
steel bridge over Noyo River Fort Bragg California Stock Photo 1232
 
While I recognize that this concept makes me a heretic, I suspect as well that the "Word of God"
cannot be represented in any language using serial symbol strings, ie what we call "words". 
 
Not the least of which have the problem that the "meaning" of a "word" is highly context
dependent, so the meaning of any sentence will change over time. 
 
This is why for example
the US Supreme Court is always stuck with the problem of trying to figure out how to 
interpret what prior decisions,  especially those over 100 years old,  actually mean.
 
What was the intent, and do those "words" continue to express that intent, and should 
they be true to the intent, or true to the words ( and what those words mean today?).
 
So, sadly, even if say the words of the Koran are kept sacred and unchanging,
the meaning people will take from them keeps changing over time.

It is really inconvenient that simple approaches to seeking truth and meaning don't work.
Partially for this reason this blog is titled "tree-circles", a whole different way to find
meaning in signals that is neither deductive nor inductive.


Another problem with using 1-dimensional symbol-strings  ( "sentences of words")
is that this type of reasoning is incredibly noise-sensitive.   In theory perhaps for
a perfect theoretical "Turing Machine" various things are considered "computable"
but in the world our bodies live in ,  nothing is without noise, and nothing lasts
forever, and most things decay rapidly - so the infinite immutable "tape" of 
1's and zero's cannot ever be implemented. 

So, even in the golden haired child of Science,  algebra and calculus, 
a single error becomes a single-point-of-failure.  A 20 page computation
with an error on page 3 is meaningless and "wrong". 

Theologians may argue forever over "the meaning" of a single word,
but their whole process is inescapably flawed, not to mention
context sensitive over time.
 
On the other hand, images are very noise-tolerant, so that even with multiple "errors" the 
end result may still be correct. You can "see through" the "salt and pepper" noise and 
realize what this image represents.

https://miro.medium.com/v2/resize:fit:512/format:webp/0*aycSUVVDcHMnuFgc.png

I got in a great deal of trouble in my computer science classes for arguing that, if the founders had image processing chips and computers,  the whole field would have been based on image processing not symbol processing.    And we would now have
"context processing engines" not just "content processing engines".      Turns out I made my arguments the week our department chair had just won the Turing Award.   Bad timing on my part.  And meaning or ability to engage and be heard depends so much on timing and fickle social context.  People never finish reading even your sentences or argument, but stop part way in, "auto-complete" your meaning into something they grew up with, and discard your work as being wrong since that meaning is wrong.

At least computer science has finally come up with Kubernetes and similar technology to save the entire context along
with application "code"  ( symbol strings. )
 
Nevertheless,  the current golden-haired child of AI is "Large Language Models",  by which they mean a huge amount of language.
 
It seems safe to predict that this will migrate and evolve towards Large Image Models, or other 2-dimensional and larger bases.
The symbols in any language essentially, like even DNA,  have epigenetic auras,  more than just metadata, which alters the meaning of the symbols enough to matter, enough that if you ignore it,  your answers will not be stable or reliable.