Luddite - is the Singularity near?

Layers of Latency

A tech buddy asked me why it is so important for China to catch up in chip fabrication process, can't they just put more servers into a data center? In short, it is not that easy.

By shrinking the fab process you can add more transistors onto one chip, and/or run at a higher frequency, and/or lower power consumption.

The fab process is measured in "nm", nanometers. Meanwhile these numbers do not reflect real scales anymore, but transistor density resp. efficiency of fab process.

Simplified, the MOSFET technology was used up to 22nm, this was a 2D planar transistor design, then from 14 to 7nm FinFET 3D structures, and below 7nm GAAFET 3D structures.

Take a look at the 7nm and 3nm fab process for example:

https://en.wikipedia.org/wiki/7_nm_process#Process_nodes_and_process_offerings
https://en.wikipedia.org/wiki/3_nm_process#3_nm_process_nodes

Roughly spoken, the 7nm process packs ~100M transistors per mm2, the 3nm process packs ~200M transistors per mm2.

And here the latency steps in. As soon as you leave as programmer the CPU you increase latency, this starts with different levels of caches, goes to RAM, goes to PCIe bus, goes to network...

Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
L2 cache reference                           7   ns
Main memory reference                      100   ns
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us
Read 1 MB sequentially from memory     250,000   ns      250 us
Round trip within same datacenter      500,000   ns      500 us
Read 1 MB sequentially from SSD*     1,000,000   ns    1,000 us    1 ms
Read 1 MB sequentially from disk    20,000,000   ns   20,000 us   20 ms
Send packet CA->Netherlands->CA    150,000,000   ns  150,000 us  150 ms

Source:
Latency Numbers Every Programmer Should Know
https://gist.github.com/jboner/2841832

As a low level programmer you want to stay on CPU and work preferred via the cache. As a GPU programmer there are several layers of parallelism, e.g.:

1. across shader-cores of a single GPU chip (with >10K shader-cores)
2. across multiple chiplets of a single GPU (with currently up to 2 chiplets)
3. across a server node (with up to 8 GPUs)
4. across a pod of nodes (with 256 to 2048 GPUs resp. TPUs)
5. across a cluster of server nodes/pods (with up to 100K GPUs in a single data center)
6. across a grid of clusters/nodes

With each layer adding increasing amounts of latency.

So as a GPU programmer you want ideally to hold your problem space in memory of, and run your algorithm on, a single but thick GPU.

Neural networks for example are a natural fit to run on a GPU, so called embarrassingly easy parallelism,

https://en.wikipedia.org/wiki/Embarrassingly_parallel

but you need to hold the neural network weights in RAM, and therefore couple multiple GPUs together to be able to infer or train networks with billions or trillions of weights resp. parameters. Meanwhile LLMs use techniques like MoE, mixture of experts, so they can distribute the load further. Inference runs for example on a single node with 8 GPUs with up to 16 MoE nodes. The training of LLMs is yet another topic, with further techniques of parallelism so they can distribute the training over thousands of GPUs in a cluster:

1. data parallelism
2. tensor parallelism
3. pipeline parallelism
4. sequence parallelism

And then, power consumption of course. The Colossus supercomputer of the Grok AI with 100K GPUs consumes estimated 100MW power, so it does make a difference if the next fab process delivers the same performance at half the wattage.

Therefore it is important to invest in smaller chip fabrication process, to increase the size of neural networks we are able to infer and train, to lower power consumption, and to increase efficiency.

More Moore....beyond Moore's Law?

Gordon Moore, co-founder of Intel, died on Friday, March 24, 2023:

"Gordon Moore, Intel Co-Founder, Dies at 94"
https://www.intel.com/content/www/us/en/newsroom/news/gordon-moore-obituary.html

...and chip-makers are struggling to keep Moore's Law alive?

"...Moore's Law is alive and well today and the overall trend continues, though it remains to be seen whether it can be sustained in the longer term..."
https://www.futuretimeline.net/data-trends/moores-law.htm

But, IMHO, we are already kind of cheating in regard of transistor-count on a chip. AMD uses up to 12 chiplets, Intel 4 slices, and Apple 2 slices in their CPUs, and now the chiplet design enters also the GPU domain, with up to 1KW power usage for super-computer chips.

We have now 5nm, 3nm in pipe, and 2nm and 1+nm upcoming fab process, ofc, meanwhile marketing numbers, but should reflect transistor density/efficiency of the fab process.

We might have upcoming X-ray lithography and new materials like graphene in pipe. What else?

What about:

- Memristors?
- Photonics?
- Quantum Computers?
- Wetware (artificial biological brains)?
- MPU - memory processing unit?
- Superconductor (at room temperature)?

I still wait to see Memristor based NVRAM and neuromorphic chip designs....but maybe people are now into Wetware for large language models, biological brains run way more energy efficient they say...

and, it seems kind of funny to me, at first we used GPUs for things like Bitcoin mining, now everybody tries to get hands on these for generative AIs. There is currently so much money flowing into this domain, that progress for the next couple of years seems assured -> Moore's Second Law.

We have CPUs, GPUs, TPUs, DSPs, ASICs and FPGAs, and extended from scalar to vector to matrix and spatial computing.

We have the Turing-Machine, the Quantum-Turing-Machine, what about the Hyper-Turing-Machine?

We used at first electro-mechanical relays, then tubes, then transistors, then ICs, then microchips to build binary computers. I myself predicted that with reaching the 8 billions human mark (~2023), we will see a new, groundbreaking, technology passing through, still waiting for the next step in this line.

The Next Big Thing in Computer Chess?

We are getting closer to the perfect chess oracle, a chess engine with perfect play and 100% draw rate.

The Centaurs reported already that their game is dead, Centaurs participate in tournaments and use all kind of computer assist to choose the best move, big hardware, multiple engines, huge opening books, end game tables, but meanwhile they get close to the 100% draw rate with common hardware, and therefore unbalanced opening books were introduced, where one side has an slight advantage, but again draws.

The #1 open source engine Stockfish lowered in the past years the effective branching factor of the search algorithm from ~2 to ~1.5 to now ~1.25, this indicates that the selective search heuristics and evaluation heuristics are getting closer to the optimum, where only one move per position has to be considered.

About a decade ago it was estimated that with about ~4000 Elo points we will have a 100% draw rate amongst engines on our computer rating lists, now the best engines are in the range of ~3750 Elo (CCRL), what translates estimated to ~3600 human FIDE Elo points (Magnus Carlsen is rated today 2852 Elo in Blitz). Larry Kaufman (grandmaster and computer chess legenda) mentioned that with the current techniques we might have still ~50 Elo to gain, and it seems everybody waits for the next bing thing in computer chess to happen.

We replaced the HCE, handcrafted evaluation function, of our computer chess engines with neural networks. We train now neural networks with billions of labeled chess positions, and they evaluate chess positions via pattern recognition better than what a human is able to encode by hand. The NNUE technique, neural networks used in AlphaBeta search engines, gave an boost of 100 to 200 Elo points.

What could be next thing, the next boost?

If we assume we still have 100 to 200 Elo points until perfect play (normal chess with standard opening and a draw), if we assume an effective branching factor ~1.25 with HCSH, hand crafted search heuristics, and that neural networks are superior in this regard, we could imagine to replace HCSH with neural networks too and lower the EBF further, closer to 1.

Such an technique was already proposed, NNOM++. Move Ordering Neural Networks, but until now it seems that the additional computation effort needed does not pay off.

What else?

We use neural networks in the classic way for pattern recognition in nowadays chess engines, but now the shift is to pattern creation, the so called generative AIs. They generate text, source code, images, audio, video and 3D models. I would say the race is now up for the next level, an AI which is able to code an chess engine and outperforms humans in this task.

An AI coding a chess engine has also a philosophical implication, such an event is what the Transhumanists call the takeoff of Technological Singularity, when the AI starts to feed its own development in an feedback loop and exceeds human understanding.

Moore's Law has still something in pipe, from currently 5nm to 3nm to maybe 2nm and 1+nm, so we can expect even larger and more performant neural networks for generative AIs in future. Maybe in ~6 years there will be a kind of peak or kind of silicon sweetspot (current transistor density/efficiency vs. needed financial investment in fab process/research), but currently there is so much money flowing into this domain that progress for the next couple of years seems assured.

Interesting times ahead.

Silicon Arms Race Continues...

TSMC invests $100 billion over 3 years:

https://www.reuters.com/article/us-tsmc-investment-plan-idUSKBN2BO3ZJ

South-Korea plans to invest $450 billion over 10 years:

https://www.extremetech.com/computing/322826-south-korea-commits-450-billion-to-chase-semiconductor-dominance

US plans to fund $50 billion for chip research over 5 years:

https://www.reuters.com/world/us/biden-jobs-plan-includes-50-bln-chips-research-manufacturing-2021-04-12/

EU commits to $145 billion investment for silicon:

https://www.eenewseurope.com/news/145bn-boost-europes-semiconductor-industry

China still 5 years behind in silicon says TSMC founder:

https://www.fudzilla.com/news/52752-china-five-years-behind-tsmc

China needs 5 to 10 years to catch up in silicon according to South China Morning Post:

https://www.scmp.com/tech/tech-leaders-and-founders/article/3024315/china-needs-five-10-years-catch-semiconductors

Complete home-grown Chinese silicon seems to be 28nm:

https://www.verdict.co.uk/china-chips-manufacture-technology/

China Boosts in Silicon...

The global silicon arms race continues, so what does China have in hands concerning CPU architectures?

Accelerator - Matrix 2000 used in Tianhe-2 supercomputer

https://en.wikichip.org/wiki/nudt/matrix-2000

Alpha - early ShenWei designs, maybe gen 1 to 3

https://en.wikipedia.org/wiki/Sunway_(processor)#History

ARM

From Huawei mobile chips, over Phytium desktop CPUs, to HiSilicon server chips there are many IP licensees.

IA64 (Itanium) - FeiTeng 1st gen

https://en.wikipedia.org/wiki/FeiTeng_(processor)#Initial_designs

MIPS64 - Loongson/Godson CPU

https://en.wikipedia.org/wiki/Loongson

POWER(8/9) - Suzhou PowerCore CP1/CP2

https://www.wsj.com/articles/ibm-technology-adopted-in-chinese-chips-servers-1426766402

RISC - Sunway ShenWei SW26010 with own ISA used in Sunway TaihuLight supercomputer

https://en.wikipedia.org/wiki/Sunway_SW26010

RISC-V - Xuantie CPU by Alibaba

https://www.techspot.com/news/81177-china-alibaba-making-16-core-25-ghz-risc.html

SPARC - FeiTeng Galaxy FT-1500 CPU used in Tianhe-2 supercomputer.

https://en.wikipedia.org/wiki/FeiTeng_%28processor%29#Galaxy_FT-1500

x86-64 - THATIC, a joint venture with AMD

https://en.wikipedia.org/wiki/AMD%E2%80%93Chinese_joint_venture

x86-64 - Zhaoxin, a joint venture with VIA

https://en.wikipedia.org/wiki/Zhaoxin

Some Rough 2020 Numbers...

~7.8 billion humans on planet earth, 9 billions predicted for 2050.

~4B internet users:
	>80% of Europe connected
	>70% of NA connected
	>50% of China connected
	>40% of India connected
	>20% of Africa connected

~3B Android + ~1B iPhone users.

2B-3B PCs worldwide (desktops/laptops) running:
	~75% Microsoft Windows
	~15% Apple MacOS
	~2% Linux
	<1% Unix

200M-300M PCs shipped annually.

~1B hosts in the internet running:
	~75% Unix/Linux
	~25% Microsoft Windows

Estimated 2% of all produced chips sit as CPUs in desktops/mobiles, the majority are micro-controllers in embedded systems.

Millions, billions, fantastillions - some rough 2020 market capitalization numbers:

Apple				~2 T$
Microsoft			~1.5 T$
AlphaBet(Google)		~1.5 T$
FaceBook			~1 T$
Amazon				~1 T$
Alibaba				~0.5 T$

Nvidia				~300 B$
TSMC				~300 B$
Samsung				~300 B$
Intel				~200 B$
AMD				~100 B$
ARM				~40 B$
HP				~30 B$
Lenovo				~20 B$

Netflix				~150 B$

Oracle				~150 B$
SAP				~150 B$
IBM				~100 B$
RedHat				~30 B$

Bitcoin				~150 B$

And the other side...

>3B people suffer from fresh water shortage
~800M people starve
>80M refugees worldwide

A Brief History Of Computers

"I think there is a world market for maybe five computers."
Thomas J. Watson (CEO of IBM), 1943

Roots

I guess since humans have fingers, they started to count and compute with them, and since they have tools, they started to carve numbers into bones.

Across different cultures and timelines there have been different kinds of numbering systems to compute with.

Our global civilization uses mostly the Hindu-Arabic-Numbers with the decimal number system, based on 10, our computers use commonly the binary number system, based on 2, the famous 0s and 1s. But there have been other cultures with other systems, the Maya with an base 20, Babylon with base 60, or the Chinese with base 16, the hexadecimal system, which is also used in computer science.

The first compute devices were mechanical helpers, like the Abacus, Napier's Bones or Slide Rule, they did not perform computations on their own, but were used to represent numbers and apply arithmetic operations on them like addition, subtraction, multiplication and division.

Mechanical Computers

The first mechanical computing machine is considered to be the Antikythera Mechanism, found in an Greek ship that sunk about 70 BC. But actually it is no computer, cos it does not perform computations, but an analog, astrological clock, a sun and moon calendar that shows solar and lunar eclipses.

In the 17th century first mechanical computing machines were proposed and build.

Wilhelm Schickard designed a not fully functional prototype in 1623.

The Pascaline, designed by Blaise Pascal in 1642, was the first operational and commercial available mechanical computer, able to perform the 4 basic arithmetic operations.

In 1672 the German mathematician Gottfried Wilhelm Leibniz invented the stepped cylinder, used in his not fully functional Stepped Reckoner.

[update 2023-06-05]

The human information age itself seems to start with the discovery of the electro-magnetism in the 19th century, the telegraph-system, the phone, the radio and already in the 19th century were electro-mechanical "accumulating, tabulating, recording" machines present, like those from Herman Hollerith, used in the American Census in 1890, which cumulated into the foundation of companies like IBM, Big Blue, in 1911 and Bull in ~1921, both used punched cards for their data processing machinery.

The Battle Ships of WWI had the so called "Plotter Room" in their centre, it contained dedicated, electro-mechanical machines for the fire-control-system of their firing turrets. Submarines of WWII had dedicated, analog computing devices for the fire-control-systems for their torpedoes.

With the Curta the use of mechanical calculators lived on, up to the advent of portable electronic calculators in the 1960s.

Programmable Computers

The punch card for programming a machine was introduced by Joseph Marie Jacquard in 1804 with his automated weaving loom, the Jacquard Loom, for producing textiles with complex patterns.

In 1837 Charles Babbage (considered as the father of the computer) was the first to describe a programmable, mechanical computer, the Analytical Engine.

Ada Lovelace (considered as the mother of programming) worked with Babbage and was the first person to publish a computer algorithm, the computation of Bernoulli numbers.

Babbage was his time ahead, as he described all parts, CPU, memory, input/output, a modern computer has, but was not able to realize his machine due to missing funds and proper engineering abilities of that time.

About a century later, Konrad Zuse's Z3, built in 1941, is considered to be the first binary, free programmable computer. It used ~600 telephone relays for computation and ~1400 relays for memory, a keyboard and punched tape as input, lamps as output, and it operated with 5 Hertz.

Mainframes

Zuse's machines mark the advent of the first mainframes used by military and science during and after WWII.

Colossus Mark I (1943), ENIAC (1945), IBM 704 (1954) for example used vacuum tubes instead of relays and were replaced more and more by transistor based computers in the 1960s.

Home Computers

With small chips, at first integrated circuits then microchips, it was possible to build smaller and reasonable Home Computers in the 1970s. IBM and other big players underestimated this market, so Atari, Apple, Commodore, Sinclair, etc. started the Home Computer Revolution, one computer for every home.

Some first versions came as self-assembly kit, like the Altair 8800 (1975), or with built in TV output, like the Apple I (1976), or as fully assembled video game console like the Atari VCS (1977), followed by more performant versions with an graphical user interface, like the Apple Mac (1984), or the Commodore Amiga 1000 (1985).

Personal Computers

IBM started in 1981 with the 5150 the Personal Computer era. Third party developers were able to provide operating systems, like Microsoft DOS, or hardware extensions for the standardized hardware specification, like hard-drives, video-cards, sound-cards, etc., soon other companies created clones of the IBM PC, the famous "PC Compatible".

Gaming was already in the Home Computer era an important sales argument, the early PC graphics standards like CGA and EGA were not really able to compete with the graphics generated by the Denise chip in an Commodore Amiga 500, but with the rise of SVGA (1989) standards and the compute power of the Intel 486 CPU (1989), game forges were able to build games with superior 3D graphics, like Wolfenstein 3D (1992), Comanche (1992) or Strike Commander (1993) and the race for higher display resolutions and more detailed 3D graphics continues until today.

With operating systems based on graphical user interfaces, like OS/2, X11, Windows 95 in the 1990s, PCs finally replaced the Home Computers.

Another recipe for the success of the PC might be, that there have been multiple CPU vendors for the same architecture (x86), like Intel, AMD, Cyrix or WinChip.

Internet of Things

The Internet was originally designed to connect military institutions in an redundant way, so if one net element fails, the rest would be still operable.

The bandwidth available evolves like compute power, exponentially, at first mainly text was submitted, like emails (1970s) or newsgroups (1980s), followed by web-pages with images (.gif/.jpg) via the World Wide Web (1989) or Gopher (1991), audio as .mp3 (~1997), and finally, Full HD videos via streaming platforms like YouTube or Netflix.

In the late 1990s, mobile-phones like the Nokia Communicator, MP3 audio players, PDAs (Personal Digital Assistants) like the Palm Pilots, and digital cameras marked the rise of the smart devices. The switch from one computer to every home, to many computers for one person.

Their functions were all united into the smartphone, and with mobile, high-bandwidth internet it is still on its triumph tour across the globe.

I am not able to portrait the current state of computer and internet usage, it is simply too omnipresent, from word-processing to AI-research, from fake-news to dark-net, from botnets of webcams to data-leaks in toys...

The next thing

but I can guess what the next step will be, Integrated Devices, the BCI, the Brain Computer Interface, connected via the Internet to an real kind of Matrix.

It seems only logical to conclude that we will connect with machines directly, implant chips, or develop non-invasive scanners, so the next bandwidth demand will be brainwaves, in all kind of forms.

[updated on 2023-08-05]

On Peak Human

One of the early Peak Human prophets was Malthus, in his 1798 book, 'An Essay on the Principle of Population', he postulated that the human population growths exponentially, but food production only linear, so there will occur fluctuation in population growth around an upper limit.

Later Paul R. Ehrlich predicted in his book, 'The Population Bomb' (1968), that we will reach an limit in the 1980s.

Meadows et al. concur in 'The Limits of Growth - 30 years update' (2004), that we reached an upper limit already in the 1980s.

In 2015 Emmott concludes in his movie 'Ten Billion' that we already passed the upper bound.

UNO predictions say we may hit 9 billion humans in 2050, so the exponential population growth rate already declines, but the effects of an wast-fully economy pop up in many corners.

Now, in 2018, we are about 7.4 billion humans, and i say Malthus et al. were right.

Is is not about how many people Earth can feed, but how many people can live in an comfortable but sustainable manner.

What does Peak Human mean for the Technological Singularity?

The advent of Computers was driven by the exponential population growth in the 20th century. All the groundbreaking work was done in the 20th century.

When we face an decline in population growth, we also have to face an decline in new technologies developed.

Cos it is not only about developing new technologies, but also about maintaining the old knowledge.

Here is the point AI steps in, mankind's population growth alters, but the whole AI sector is growing and expanding.

Therefore the question is, is AI able to take on the decline?

Time will tell.

I guess the major uncertainty is, how Moore's Law will live on beyond 2021, when the 4 nm transistor production is reached, what some scientists consider as an physical and economical barrier.

I predict that by hitting the 8 billion humans mark, we will have developed another, groundbreaking, technology, similar with the advent of the transistor, integrated circuit and microchip.

So, considering the uncertainty of Peak Human vs. Rise of AI,
i give +-0 points for the Singularity to take off.

More Moore

If we can not shrink the transistor size any further, what other options do we have to increase compute power?

3D packaging

The ITRS report suggest to go into third dimension and build cubic chips. The more layers are build the more integrated cooling will be necessary.

Memory Wall

Currently memory latencies are higher than compute cycles on CPUs, with faster memory tehchniques or higher bandwidth the gap can be closed.

Memristor

The Memristor is an electronic component proposed in 1971. It can be used for non-volatile memory devices and alternative, neuromorphic compute architectures.

Photonics

Using light for computation sounds quite attractive, but the base element, the photonic transistor, has yet to be developed.

Quantum Computing

Really, i do not have a clue how these thingies work, somehow via Quantum Effects like Superposition and Entanglement but people say they are going to rock when they are ready...

Considering so much room for research,
i give +1 points for the Singularity to take off.

The End of Moore's Law?

Moore's law, the heartbeat of computer evolution, is the observation that every two years the amount of transistors on integrated circuits doubles. Gordon Moore, co founder of Intel, proposed an doubling every year in 1965 and an doubling every two years in 1975.

In practice this results in an doubling of compute power of computer chips every two years.

The doubling of transistor amount is achieved by shrinking their size. The 1970s Intel 8080 chip was clocked with 2 Mhz, had about 6000 transistors and was produced in an 6 Micrometer process. Nowadays processors have billions of transistors and use an 14 or 10 Nanometer process.

But less known is Moore's Second Law, the observation that also the investment costs for the fabrics grow exponentially.

The last ITRS report of 2015 predicts that transistor shrinking will hit such an economic wall in 2021, and alternative techniques have to be used to keep Moore's Law alive.

Considering this news,
i give -1 points for the Singularity to take off.

Home - Top