Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

A Tale of Two CPUs

By MK77 in Technology
Sun Feb 10, 2002 at 05:17:27 PM EST
Tags: Technology (all tags)

We are fast approaching a major change in the CPU architecture of the computers on our desktops. Since the introduction of the 386 in the 1980s, most of use have been using CPUs based upon Intel's 32-bit x86 instruction set architecture. That is all about to change as we make the transition from 32-bits to the new world of 64-bit machines. Soon, it will be the first time in computing history that 64-bit CPUs have ventured out of the high-end workstation market and will be fighted for consumer mindshare during the Superbowl.

In the article below, we'll take a brief look at the need for 64-bit CPUs, the history of the x86 architecture, the options for the transition to 64-bits, and what it might mean for you.

32-bits is Not Enough

For two very interesting years of my life, I worked for a small company in Silicon Valley called Pacific Data Images. The history of PDI goes back some twenty-odd years and includes some of the earliest commercially producted computer generated imagery, but their most recent claim-to-fame is the production of the movie Shrek. When I joined the company in early 1999, they were just beginning the transition from SGI manufactured, MIPS-based machines to commodity manufactured, x86-based PCs running Linux as their machine of choice for the cluster of machines colloquially known as their "render farm".

The render farm, or just the Farm, was simply the collection of machines contained in a locked, well air-conditioned room on the first floor of the PDI building. During the course of Shrek's production, the Farm grew from a few hundred machines to over a thousand. The newest computers were dual-CPU Pentium III's with a couple gigs of RAM, each. The motivation from moving away from SGI machines to Linux PCs was clear. You could get equivalent CPU power for just a fraction of the price. However, these nice Linux boxes weren't any better than their Irix based equivalents in one respect: 32-bits of addressing was all you could get.

To understand why 32-bits just isn't enough in all circumstances, you've got to understand a little about how your operating systems and how applications use memory. While I'm explaining this, remember that our goal is to produce the complex imagery you see in Shrek. First of all, you've got a lot of data. If you find yourself watching the movie, take a look at the trees. Notice the grass. The director's didn't really want you to focus your attention on such things the first time you saw the movie, but try to get an estimate of the polygon count for these things.

You're probably thinking that those leaves aren't modelled by hand. You'd be right; they're generated procedurally, so they're not stored exlicity on disk, but when you want to generate an image, they do make an impact. Remember, even if we stream the polygons through memory without explictly storing all the vertices at once, we've still got to anti-alias these things -- which means that we can't simply store just one RGB triple for each pixel -- and remember that a film image is about two-thousand pixels wide. You're starting to get some idea. Remember also that we're probably loading lots of very-high resolution texture maps for every object in the shot, and the characters alone are several million polygons.

What does this all add up to? A lot of RAM. We're talking, "We're not kidding Mr. Gates, we're having a hard time fittin' everything in two-gigs, let alone your laughable 640k" kind of memory usage.

At this point the astute reader will note that I said the machines only had two gigs of RAM, and a 32-bit architecture should give you four gigs of address space. Figuring that your performance would only be decent if your working set is less than twice as big as the amount of physical RAM on-hand, there shouldn't be any problems, right? Right?


When Linux actually runs a collection of tasks, it gives each process its own four gig address space. So far, so good. The problem is that Linux splits up the address space into a myriad of regions. First, the kernel reserves a portion of memory for its own use, leaving the rest for the process to use. In a default configuration, the 2.2 Linux kernel claimed the upper two gigs of address space for its own use, leaving the lower two gigs to user space. That's half the address space right there! Furthermore, the lowest point of the application's space is reserved as unmappable, because you want your NULL pointer dereferences to produce exceptions. The executable image is loaded a bit above the reserved NULL space at the bottom of memory. Above that, statically linked shared libraries are loaded. Above that, we've got the sbrk memory heap. Next is the region for mmaped pages of memory. Finally, at the top of user memory space and below the kernel's segment of address space, we've got the main stack of the process.

With all that sectioning off of memory, you'd be lucky to allocate one gig of memory for your process to use. You can tune things somewhat, but the problem of limited memory allocation is exacerbated by the two strategies used by the default implemenation of malloc() in Linux's standard C library. Small memory allocations use the sbrk heap, which slowly grows just above the executable's image in memory. Large memory allocations call mmap() and are mapped in higher up in the address space. This dichotomy makes sense; allocating small amounts of memory with mmap would be very wasteful, and allocating large regions with sbrk() would prevent the process from giving memory back to the operating system for other applications after it is finished using it, but the downside is that you've got two memory regions with a relatively small maxiumum size, and attempting to increase the size of one decreases the size of the other. If you are running a variety of programs, some of which allocate thousands and thousands of small amounts of memory, and others which allocated a few very large chunks of memory, then you're in trouble, because you can't trade off one kind of memory for another.

What's the effect of all this? Well, your applications run out of memory space, malloc() starts returning NULL pointers and you probably experience a crash soon thereafter. There were certainly more than a few frames of Shrek which crashed because they were out of address space. Note how this is different from running out of memory.

Maybe you're able to discount all this as a problem only experienced in certain high-end niches, but I think that applications and the data they work on are getting bigger all across the industry, and it won't be long before this problem starts to show up when you are running Photoshop at home, doing video editing of the family trip to the Grand Tetons, or doing page-layout for that indie punk rock magazine you publish. You will see the problem, and you'll be pissed off because your applications are crashing just when you added a bit more data. You'll start to want more address space. You'll start to want a 64-bit architecture.

A Brief History of x86

In the beginning, there was the 8088. Intel said, "Let there be a one-chip CPU." And there was a one-chip CPU, and it was good.

Although I'm too young to remember it, I'm sure the 8088 was a nice CPU when it was introduced. You've got eight semi-general purpose 16-bit registers, AX, BX, CX, DX, SI, DI, BP and SP. I call them "semi-general purpose", because some instructions of the 8088 only operated on some registers, so BX is more useful for multiplication than other registers, CX is more useful for a loop counter, and SP is always the stack pointer, but mostly you can use those registers as you please.

As far as address space goes, you'd expect the 8088, with 16-bit registers, to only be able to address 64k of memory, but Intel used something slightly clever and slightly annoying: segmented memory addressing. This meant that addresses for the 8088 consisted of two numbers, a segment and an offset, and looked something like 0080:0CD1. Internally, the 8088 would multiply the first number by 16 and then add the second number to find the actual desired memory location. Instead of 64k of address space, you've got about one megabyte of addressable space. You've got to use some of the address space for the ROM BIOS and video memory and other hardware needs. If you started the hardware region of memory at, say, A000:0000 and left everything below that as usable by the operating system and user applications, you'd have 640k of RAM availble. And that's exactly what happened.

The 8088 was fine for operating systems like DOS, but to run something like a real Unix operating system, a few CPU features were missing. The most important of these was memory protection. Intel's answer to this problem? The 286. Memory protection allows the CPU to expose certain areas of memory to some programs while protecting other areas of memory at the same time. When the 286 ran in protected mode, instead of using a segment and offset, memory addressing would use a selector and an offset. The selector was a 16-bit number stored in the segment register, but to get the base address, the CPU no longer multiplied this number by 16, but instead used this number as an index into a table of memory protection entries maintained by the operating system. Furthermore, the CPU could set up permissions on the memory protection entries so that a process could access some entries, but not others, based on its permission level.

There are a couple of interesting implications about memory protection. First, notice that the base address stored in the memory protection table doesn't necessarily have to be the same size as the registers of the CPU, so a CPU could theoretically use more RAM than address space by divying up sections of RAM between processes through clever use of the memory protection table. Second, we are actually providing support for security at the CPU level. Without memory protection, malicious programs would be able to subvert Unix-style file system permissions by simply patching the OS kernel in memory to skip file permission checks. Memory protection prevents such attacks from succeeding.

Memory protection was nice, but the 286 still had the problem that it used 16-bit registers for offsets, meaning that you could only access 64k at a time without changing your memory selectors for every pointer you followed. Intel fixed this with the 386. The 386 widened all the existing 16-bit registers to 32-bit. Our old friend AX? Now we'll call him EAX if we want all 32 bits and AX if we want only the lower 16-bits. Similarly, we've got EBX, ECX, EDX, ESI, EDI, EBP and ESP. Wow, suddenly we've got four gigs of address space instead of 64k. Neat, isn't it? Of course, we've still got memory protection too.

Even though your PC from Dell boots up in the same 16-bit Real Mode used by the 8088, nearly any operating system you are likely to be running on your Pentium family CPU is using the 32-bit protected mode introduced with the 386. Linux uses it. FreeBSD uses it. Windows 2000 uses it. Windows XP uses it. BeOS uses it. AtheOS uses it. The fact is, as far as most assembly programmers are concerned, the Pentium III isn't anything other than a 386 tricked out for maxiumum speed. Intel has made a few additions with MMX and SSE, but the core instruction set has remained exactly the same.

And Now for Something Completely Different

Perhaps you've heard about Intel's Itanium CPU. The design of the Itanium CPU is a joint project between Intel and HP, and is their answer for the 64-bit era. The instruction set it uses is caled IA-64, but it really doesn't have much in common with the IA-16 instruction set of the 8088 and the IA-32 instruction set of the 386. In fact, its quite a significant departure from what has come before.

First of all, instead of the familiar eight registers of the x86 architecture, we've now got 128 general purpose registers. The first 32 of these registers are global registers, accessable in the same way throughout the lifetime of a process. The remaining 96 registers are managed through the ALLOC instruction, which will allow a function or method to reserve a particular number of these registers for its use, shifting an internel index such that the contents of some of the registers of the calling function or method are hidden from view and the ones which remain visible are used to pass function arguments.

If that wasn't a big enough change, the Itanium is also a VLIW (Very Long Instruction Word) CPU. This means that CPU instructions aren't the same atomic units they once were. Now CPU instructions come in bundles of three. Each instruction bundle also happens to be exactly 128-bits long, whereas old x86 instructions were of varying length and aligned willy-nilly throughout memory. Furthermore, instructions are organized into groups which can potentially be executed in parallel. Need to add some numbers and multiply some others? No problem, with IA-64 you can tell the CPU it can do both at once, without waiting for one or the other to complete. To further complicate things, the group boundary can actually be in the middle of an instruction bundle!

As you can see, code for Itanium CPUs is at least as different from plain-old x86 code as code for, say, PowerPC CPUs would be. This has both advantages and disadvantages. On the up-side, CPUs don't have to spend a lot of time decoding the archaic and baroque x86 instruction format anymore. On the downside, all the work spent on getting gcc to generate good x86 code will be useless for Itanium CPUs. Kaffe and Mono's x86 JITers? They will have to be rewritten. Optimized graphics libraries and drivers? They will need to be reoptimized for Itanium's architecture. VMware's instruction stream analyzer? It'll need a complete rewrite.

Now, the Itanium does actually have an x86 backward's compatibility mode on the CPU, but that really makes the Itanium like two CPUs in one. You'd flip a few bits in memory, and suddenly you are using completely different circuitry to decode and execute your instructions. Running x86 applications on an Itanium is analagous to running Windows applications under Linux with WINE -- maybe it works, but only because you've duplicated one completely different environment inside another.

Back to the Past

Obviously there's a lot of work involved in transitioning from the world of x86 to the Itanium world. Maybe there's another way? That's exactly what AMD is hoping. With the introduction of their Hammer CPU architecture, AMD is introducing the x86-64 instruction set.

Remember, where IA-64 is completely different from IA-32, x86-64 is more of the same. We've got the same old eight registers, and AMD has chosen to add eight more, for a total of sixteen. The instruction opcodes are the same familiar ones from the 8088. There aren't any bundles or groups here. Assembly programmers won't need to learn many new techniques to write streamlined code for x86-64, because all their old knowledge will be immediately applicable.

The clear advantage of x86-64 is that developers can get applications working on it much more quickly than with IA-64. No need to write a new back-end for your compiler; that old one will do, with a few 64-bit modifications. That graphics library you've got in the corner which emits x86 code based on the current state? A few modifications to it, and you'll be fine. Oh, you wanted streamlined performance and explicit parallelism in your instruction stream? Sorry, x86-64 is the same thing we've been dealing with for the past twenty-five years, warts and all.

And the winner is...

It's too early to tell who will win the battle of the 64-bit CPUs, but it is clear that the conflict will explode in the next few years. AMD has pragmatism on their side, but Intel has the theoretical performance advantage and the 500-pound gorilla advantage. If the situation were to be reversed, with Intel going the route of backwards compatibility and AMD coming up with the new-fangled instruction set, my money would be on Intel for certain, but as it is, it's very tough for me to make a call.

Will the performance advantage of VLIW make good, or will AMD be crowned the new CPU king?

Only time will tell.


Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure


Which will it be?
o IA-64 25%
o x86-64 48%
o 32 bits is good enough 6%
o Other 18%

Votes: 74
Results | Other Polls

Related Links
o Pacific Data Images
o Shrek
o Itanium CPU
o x86-64
o Also by MK77

Display: Sort:
A Tale of Two CPUs | 89 comments (76 topical, 13 editorial, 0 hidden)
Segmented addressing (4.30 / 10) (#2)
by ucblockhead on Sun Feb 10, 2002 at 03:15:20 PM EST

but Intel used something slightly clever and slightly annoying: segmented memory addressing.
Slightly annoying!? Slightly!? My God, man, you're making light of the bane of my coding existence from 1986 to 1990!
This is k5. We're all tools - duxup
<AOL> (4.00 / 2) (#28)
by wiredog on Sun Feb 10, 2002 at 07:52:06 PM EST

Yeah, I was doing embedded/industrial on 286/386/486 processors. Lots of assembly coding. Remember the tiny, small, large, and huge memory models?

Peoples Front To Reunite Gondwanaland: "Stop the Laurasian Separatist Movement!"
[ Parent ]
Oh yeah! (4.00 / 3) (#29)
by ucblockhead on Sun Feb 10, 2002 at 07:56:10 PM EST

I was doing TSR work...Love that Tiny model, only 64k of data and code! (And hey, you forgot "medium" and "compact"!)

"Gee, is that a 'near' pointer or a 'far' pointer"?

"Remember, you can't compare pointers with pointers for equality!"

That last was my favorite bit. Two pointers could point at the same thing and yet not have equal values!
This is k5. We're all tools - duxup
[ Parent ]

Don't forget ... (none / 0) (#50)
by Herring on Mon Feb 11, 2002 at 07:50:27 AM EST

... huge arrays. You better be sure boy that your structure size divides into 64K or there's gonna be theat one element over the boundary ...

How did we put up with that crap?

A few months ago, I remember reading the stuff on AWE in Win2K AS. Far pointers back from the grave. Nightmare.

Anyhow, on the off chance that no one else has link this here's more information on Itanium than you need.

Say lol what again motherfucker, say lol what again, I dare you, no I double dare you
[ Parent ]
Huge pointer normalization, DMA, and EMS (none / 0) (#69)
by pin0cchio on Mon Feb 11, 2002 at 04:34:27 PM EST

You better be sure boy that your structure size divides into 64K or there's gonna be theat one element over the boundary

So you're saying that size of structures in arrays referenced through huge pointers needed to be a power of 2. Wrong. Renormalizing a pointer let it point anywhere in the first MiB. For example, if you had a structure starting at 3040:ffc4, you could renormalize that to 403c:0004 and access it just fine. Most image manipulation software renormalized huge pointers at the beginning of each scanline (i.e. out of the innermost loop), which let a single scanline be up to 64KiB. Limiting scanline stride to multiples of 16 bytes made renormalization fast as you only needed to add constant values to segment registers DS and ES.

The problem was with DMA, which couldn't cross a 64KiB bank boundary. Most apps just allocated a 32KiB arena in low memory and used whatever 16KiB that didn't cross a bank boundary, the other 16KiB being used for other purposes. Another problem was with EMS, which provided four 16KiB memory windows to access expanded memory, and you couldn't access a structure that crossed a 16KiB window boundary unless you used the relatively easy workaround of loading two segments into two consecutive windows. (This worked because the EMS3 spec required the four windows to be consecutive in the upper memory address space.)

[ Parent ]
It was the making of mine! (none / 0) (#66)
by deefer on Mon Feb 11, 2002 at 02:42:24 PM EST

Because when you were using large amounts of data (image processing) and you needed fast access, relying on the catch-all-situations compiler was slooooooow in huge...

So you'd lob some assembler halfway in your routine, some clever SHL and INC stuff IIRC, make sure your index variables were in the right registers, and you'd be up with a performance boost of up to 250% over a straight compile. And all the other programmers used to worship my 1337ness... Aaaaah them were the days! </nostalgia>

Kill the baddies.
Get the girl.
And save the entire planet.

[ Parent ]

Disclaimer. (3.00 / 8) (#3)
by ti dave on Sun Feb 10, 2002 at 03:20:05 PM EST

I haven't written a program since 1983, but after reading this story,
it seems to me that IA-64 will prevail due to Human Laziness.

"If you dial," Iran said, eyes open and watching, "for greater venom, then I'll dial the same."

64-bit Render Farm (4.08 / 12) (#4)
by danimal on Sun Feb 10, 2002 at 03:26:54 PM EST

Congrats on Shrek. I've been working for the last five years now at Blue Sky Studios. We've just finished production on our first movie, Ice Age. We evaluated the 32-bit x86 architecture, but decided not to go with it. Instead we went with Compaq Alpha computers to render on.

This afforded us three major things:

  1. the fastest architecture on the market (not only at the time some 2.5-3 years ago, but still faster than x86 offerings today)
  2. a larger memory address space if we needed it
  3. fewer render machines needed than any other solution.
Amazingly enough we only had 512 of these Alpha machines, all with 1GB of RAM. Yes, 1GB of RAM. Of course our rendering architecture allows us to be more efficent than those that tessellate to polygons (we track directly to the NURBS surface). Oh yeah, we also ray traced the entire movie. There's a first for you.


<tin> we got hosed, tommy
<toy> clapclapclap
<tin> we got hosed

If 64-bit gets people buying (1.70 / 10) (#5)
by Ken Pompadour on Sun Feb 10, 2002 at 03:28:46 PM EST

Then it can't possibly be a bad thing! Buy Buy-Co!

...The target is countrymen, friends and family... they have to die too. - candid trhurler
It should also be noted... (4.28 / 7) (#6)
by DeadBaby on Sun Feb 10, 2002 at 03:46:11 PM EST

That there seems to be a x86/64bit hybrid project inside Intel. It may never see production but if AMD's x86-64 chips are successful there's a very good chance it will.

Intel may design Hammer-like chip

"Our planet is a lonely speck in the great enveloping cosmic dark. In our obscurity -- in all this vastness -- there is no hint that help will come from elsewhere to save us from ourselves. It is up to us." - Carl Sagan
x86-64, the Inquirer has been covering it ... (4.66 / 3) (#16)
by joegee on Sun Feb 10, 2002 at 05:15:48 PM EST

The following stories have recently been published on Mike Magee's tech site The Inquirer:

"Intel steps up X86-64 skunkworks"

"Intel will can Itanic if boy Hammer does too well"

"Intel has free access to AMD X86-64 code"

"Home of Intel Skunkworks pinpointed"

Intel would appear to be creating an x86-64 compatible processor in parallel to their ongoing IA-64 efforts, just in case AMD's approach to 64 bit computing catches on with Hammer. According to comments I have read Microsoft beta testers will remember seeing a folder dealing with AMD64 on their 64 bit Windows XP beta CD ROMs. There is a 64 bit version of Windows ready for Hammer, but whether or not it sees the light of day will probably be decided on whether or not Mr. Gates is feeling particularly fond of Mr. Groves at the particular moment the decision is made.

<sig>I always learn something on K5, sometimes in spite of myself.</sig>
[ Parent ]
RE: It should be noted (none / 0) (#76)
by paulcourry on Tue Feb 12, 2002 at 02:51:07 AM EST

I think what you will find is that IA-64 will be most common in the high end server world where software is kept updated by yearly maintenance contracts and thus the cost of going to a 64 bit version of an app will be amortized by the software developers over a period of years through revenue by maintenace contracts.

On the consumer end where Micro$shaft reigns the software developers will get their money upfront, thus giving a BIG cost advantage to AMD. No one will want to repurchase all their apps at list price and this is where the AMD approach scores big.

So you might just see a two pronged approach by Intel. IA-64 at the top of the market and an x86-64 approach for the bottom end to prevent a complete AMD takeover.

[ Parent ]

Performance advantage (4.80 / 5) (#9)
by twodot72 on Sun Feb 10, 2002 at 04:20:15 PM EST

Thanks for a good article. I just want to comment on this quote from the article:

Will the performance advantage of VLIW make good

It is not yet demonstrated that VLIW has a performance advantage in the general case. AMD is not convinced, neither is IBM.

For regular applications, like many floating-point intensive calculations, it seems to be a good idea, and therefore it has been used before in DSPs (signal processing) which almost exclusively does such calculations. But this kind of parallelism can also be exploited quite successfully with SIMD units, such as SSE2 in the Pentium 4.

Other, and I'd say most, programs are more tricky; you have to find instruction level parallelism in order to take advantage of a VLIW core. This is not easy. As of yet, this can be demonstrated by Itaniums poor performance on the SPECint benchmarks. It remains to be seen if the compiler wizards at Intel will be able to dig out any more parallelism from these programs. The Itanium needs this, being a very large and complex design, it does not seem to be able to run very fast (currently below 1GHz).

A further problem for Intel is that the Itanium is a very large chip, and the successor will be even larger. This means high production cost, and high power consumption/heat dissipation (something server makers especially dislikes). AMDs chip will be insignificantly larger than the current Athlon design, and backwards compatible without performance penalty.

On the server/workstation side, Itanium has to compete with IBMs POWER4 chip, which at the moment seems to be significantly faster. IBM shunned the VLIW concept in favor of a classical superscalar design, and a very fast and wide memory system. This is a good idea, since the memory system in general is a significant performance bottleneck.

That said, I think Intel has the manpower and financial resources to make Itanium a success despite these problems. It just won't be the easy victory pundits expected a few years ago.

Ah, well, that was a long rant. Enough already :-)

In order (none / 0) (#37)
by srichman on Sun Feb 10, 2002 at 10:02:26 PM EST

One thing you fail to mention is that the Itanium VLIW implementation is obviously strictly in-order. This is a big departure from previous generations, which spent quite a few transistors on out-of-order execution and all its concomitant hazard, speculation, and exception precision headaches. The simplified in-order design should, in the long run, allow Intel to push the clock speed higher than what would have otherwise been possible. I attribute the current low clock speeds to Intel's inexperience with the new architecture, and expect it to improve.

Compiler technology has advanced to the point where VLIW is viable. The current compilers may not be perfect, but, again, they'll only improve as VLIW becomes more widespread. I firmly believe that static analysis and instruction reordering can do as well as all the transistors on my desk being spent on dynamic reordering.

Finally, to clarify something from the original article:

First of all, instead of the familiar eight registers of the x86 architecture, we've now got 128 general purpose registers.
x86 has eight GP registers that can be named by your code, but the register renaming unit maps to a much greater number of physical registers at execution time. In the Pentium 4, for instance, there are 40 physical registers that are transparently used.

[ Parent ]
not that far off (none / 0) (#39)
by Pink Daisy on Sun Feb 10, 2002 at 11:48:22 PM EST

Intel has to be thinking already about what it will take to make an OO Itanium family processor. With all the logic that goes into predication and speculation (stuff like NaT's and delayed exceptions in particular), they can't be gaining a whole lot over a traditional OO processor without all that cool stuff that compiler writers hate.

[ Parent ]
iAPX-432 - intel *DID* have OO CPU. (none / 0) (#46)
by artemb on Mon Feb 11, 2002 at 02:32:17 AM EST

Actually, Intel already did it. There was a pretty interesting project in late 70s - early 80s called iAPX-432. Unfortunately it turned out to be quite a bit ahead of its time. Despite all innovative ideas that went into iAPX-432, technology at the time was not good enough to compensate for the chip's complexity and its impact on performance. There's not that much information available about the project these days. Here is one page that has some more info on Intel iAPX-432 Micromainframe. Google also has a cached copy of an article What Ever Happened to... Intel's Dream Chip? published 2/97 in computer shopper.

[ Parent ]
cute (none / 0) (#57)
by Pink Daisy on Mon Feb 11, 2002 at 09:32:40 AM EST

Something even harder to build a compiler for than Itanium, I suspect. It would never catch on, though. Requiring all those assembly language programmers to learn object oriented methods would just be mean.

[ Parent ]
high-level assembly (none / 0) (#68)
by artemb on Mon Feb 11, 2002 at 03:36:12 PM EST

Yep. ADA-like assembly with real (i.e. supported in hardware) objects - it's something a bit more interesting that all those AX/BX/CX/DX-es.. :-) As for compilers, iAXP-432 was designed with ADA language in mind and as such writing ADA compiler for it would probably be not as complex as it would be for the reast of notorious IA-32 family. I used to work with some guys who developed a system with Modula-like CPU instructions. According to those guys writing modula compiler for the system was piece of the cake - syntax tree was almost directly mapped into machine code. Well, the fate of the project was very similar to the Intel's one. It worked, but was soon forgotten. Go figure what it takes to be successfull.

[ Parent ]
Complexity (none / 0) (#45)
by twodot72 on Mon Feb 11, 2002 at 02:14:20 AM EST

Yes, apparently it is in-order. However, from what I've seen in terms of the architecture (and from what you can infer from the die size figures) is is not a simple chip. They've added much more stuff than they've taken out. It seems like the next generation Itanium chip will not have a clock speed much above 1GHz.

Compilers. Yes, I've heard so much about what wonders they can do with static analysis methods, like parallelizing compilers and whatnot. I hope it starts to materialize soon, in the meantime I remain sceptical. On the other hand, the Intel C++ compiler with vectorization is a good omen in that respect.

[ Parent ]

Architecture Lifetime (4.80 / 5) (#19)
by Bad Harmony on Sun Feb 10, 2002 at 05:33:56 PM EST

IBM is still producing computers based on extended and enhanced versions of the IBM 360 architecture, which is more than 35 years old (1965).

Address space limitations killed the PDP-11 and other 16-bit minicomputers. There were kludges to extend the address space, but they only delayed their demise.

I don't think the Intel IA-32 architecture has a lock on the future, at least not for architectural reasons. Operating systems and applications are more processor independent than they used to be. Assembly language is rarely used in modern software.

As a friend of mine used to say, Intel's big advantage is not their microprocessor designs, it's their fabs. They can manufacture huge quantities of complex microprocessors and sell them at low prices.

5440' or Fight!

Processor independence (3.66 / 3) (#22)
by mech9t8 on Sun Feb 10, 2002 at 05:59:14 PM EST

I don't think the Intel IA-32 architecture has a lock on the future, at least not for architectural reasons.

If x86-64 wins, I don't think it'll be for architectural reason themselves, but for a side-effect of the architecture: namely, that Hammers will run x86-32 code far faster than Itaniums. People won't want to run the majority of applications at a snail's pace in order to run 64-bit applications when they can run both at a decent speed on an x86-64 chip.

Intel's going to have to really encourage developers to distribute multiple binaries (or, perhaps, fat binaries) so users don't even have to think about what their architecture is: Intel has to make sure that, in a few years, everything for Windows will include x86 and IA-64 binaries, even if it doesn't use the 64-bit features.

(In theory, that should just require a recompile - Microsoft made it so 64-bit instructions have to be included explicitly, so a Win32 application can be compiled to Win64 with no modifications. But it'll still have to be tested, so Intel should give every programmer an Itanium box to test the Itanium binaries... they can start with me...<g>)

Failing that, it might be worthwhile to just include an x86 core (either on the chip or in some sort of dual-processor arrangement) instead of relying on the x86-to-IA64 conversion, which is apparently dead slow. If they could somehow package, say, an Itanium chip with a Pentium 4 (running at double the Itanium clock speed) then they'd compete a lot better.

Of course, adding the P4 core to the Itanium chip would make the chips way too big and complicated to be manufactured cheaply.... and I have no idea how to handle a dual-processor system with two different architectures (and, probably, clock speeds). But that ain't my job... I'm just here to post impossible expectations... ;)

[ Parent ]

K5 bug? (1.85 / 7) (#20)
by Bad Harmony on Sun Feb 10, 2002 at 05:38:12 PM EST

Please kill parent. The preview function looked good but as a side effect, it corrupted the contents of the "comment" text box.

5440' or Fight!

I'd guess PPC (3.50 / 2) (#21)
by Pink Daisy on Sun Feb 10, 2002 at 05:56:44 PM EST

With MIPS and Sparc not long for this world, I'd have to say that IBM is the only one able to stand up to Intel and the x86 giant.

Intel's only chance is if they can pull a miracle from their both their compiler people and their next generation Itanium chips.

Next generation Itanium chips (3.00 / 1) (#26)
by DJBongHit on Sun Feb 10, 2002 at 07:32:23 PM EST

Intel's only chance is if they can pull a miracle from their both their compiler people and their next generation Itanium chips.

This is true, but keep in mind that they have a lot of very talented people working on the next generation of the Itanium - Compaq's Alpha team (at least a good chunk of it) is going to Intel specifically to make the Itanium not suck. They did a damn good job with the Alpha, and they may be able to pull it off again.


GNU GPL: Free as in herpes.

[ Parent ]
mixed feelings about alpha (3.00 / 1) (#38)
by Pink Daisy on Sun Feb 10, 2002 at 11:44:36 PM EST

Alpha hit its target bang on--the market for really expensive PC's that performed so great but were a financial drain on both the owners of the chips and the owners of the architecture. Intel may be aiming for the same market, or even more extravagant ones, and do fine at that for a while. Problem is, if they do that the market will eventually get eroded from beneath them by cost-effective chips that have much broader appeal and hence many more R&D dollars behind them. Unless they eventually introduce a cheap workstation or desktop version of the chip, they'll eventually get eaten alive by whatever architecture has such a thing. Chances are that would be x86-64 or PPC.

That shouldn't be a surprise to anyone. Alpha, sparc, MIPS and PA-RISC have all fallen to IA-32, a competitor that isn't even in the same class. The problem is that keeping a processor family alive for a niche market is a very expensive proposition, and these companies can't keep up when Intel and AMD bump the speed of their processors every four months. I guess my hope for PPC domination is based more on a blind hatred for x86-64, but unless Intel comes up with new technology and a new business plan within three to four years, I don't see any hope for Itanium.

[ Parent ]
Intel can win either way (3.00 / 1) (#44)
by mech9t8 on Mon Feb 11, 2002 at 02:06:58 AM EST

Even if IA64 fails, Intel can just add x86-64 instructions to their x86 line. They've got the resources to maintain both lines until there's a clear victor, and they've got the resources to make sure that they win either way.

The only way Intel could lose is if AMD's Hammer is so spectacular that it overwhelms the market, taking away Intel's market share very quickly - but I don't think that AMD even has the manufacturing capacity to do so. And I don't see a scenario where PowerPC wins, nice chip though it is.

...Unless Microsoft decides to adopt it or something. Get people to make most software for the .NET CLR instead of x86 binaries, re-release Windows for the PowerPC (with a virtual PC-based subsystem for x86 compatibility), perhaps release a Microsoft PC based on PowerPC and voila. Can't see their motivation for doing so, though... ;)

[ Parent ]
In the Beginning Quibble (4.14 / 7) (#23)
by localroger on Sun Feb 10, 2002 at 06:33:47 PM EST

Although I'm too young to remember it, I'm sure the 8088 was a nice CPU when it was introduced.

Wrong, sonny. The 8086, followed by 8088, were godawful atrocious processors which were only adopted by IBM because the main competitors were in use by IBM's main competitors -- Zilog's by Tandy, Motorola's by Apple and Atari, TI by themselves. Nobody else had a suitable CPU (especially a 16-bit CPU) ready to go and so Intel got the bid.

The original x86 was so far ahead of its time that, at the time, it was a pure-D piece of crap. Typical x86 software was slower than Z80 code running at half the MHz, the object files tended to be twice the size, and the hated segmentation registers overcomplicated everything. (A few of these flaws have been "fixed," but many more introduced into this CISC architecture as it tries to balance backward-compatibility with forward motion.)

In those days when 64K of RAM could set you back $100, this was a big deal. Time has been very forgiving of what, in any other industry, would have been a series of unbelievable fuckups on Intel's part.

And off the topic of your historical sense, your modern problems are not 32-bit problems; they are Linux problems. A clean 32-bit architecture gives you 4 Gb of RAM, and 4 Gb ^2 with paging. (If you don't remember what paging is, you have no business critiqueing CPU design.) 64 bit architecture will give you less than a double speed improvement, since some narrow operations can't be done in parallel. This can be accomplished at 32 bit with the kind of pipelining and caching schemes which were pioneered by the x86. But if RAM access is your limitation, you need to look to the software.

I can haz blog!

4 GB max per process for flat address space (4.60 / 5) (#25)
by MK77 on Sun Feb 10, 2002 at 07:06:35 PM EST

You're right that a cleaner OS could offer the full four gigs of address space for your process, but if you want a flat address space, that's all it's going to get you, per process. It's much less of a pain to recompile your application for a 64 bit architecture than to worry about paging things in and out of your address space, or splitting up your work between multiple processes.

As for the 8088 being poor at the time of arrival, I'm sure you're right. I really have no idea.

Mmm... rageahol
[ Parent ]

You don't need flat address space (none / 0) (#53)
by localroger on Mon Feb 11, 2002 at 08:41:43 AM EST

This, again, is an algorithm problem. VB 4.0 will cheerfully allow you to declare a 2-megabyte array on a '486 under Windows 3.1. You take a performance hit (magnitude depending on your algorithm) but a teeny little assembly language routine can make this hit almost invisible if your main code is written in a higher level language like C.

I can haz blog!
[ Parent ]

Quibble (1.66 / 3) (#30)
by ucblockhead on Sun Feb 10, 2002 at 08:17:08 PM EST

A minor quibble...the 8088 and the 8086 were released at the same time. (The 8088 being an 8-bit chip, the 8086 being the 16-bit chip.)

Can't argue with any of the rest of that, though.
This is k5. We're all tools - duxup
[ Parent ]

Wrong! (3.00 / 3) (#36)
by Inoshiro on Sun Feb 10, 2002 at 09:34:56 PM EST

The 8088 had an 8-bit data bus width which, as the parent to your reply noted, is not the same size as the word of the architecture in question. The only 8bit proc I can think of from Intel is the famed 4004.

The 8088 was used in the PC XT, and (as it had 8 bits data bus width) sucked compared to the 16bit data bus of the 8086. It's double the amount per clock tick moved.

[ イノシロ ]
[ Parent ]
Intel 8-bit (5.00 / 1) (#52)
by localroger on Mon Feb 11, 2002 at 08:38:50 AM EST

After the 4004:

8008, 8080A, 8085

The 8080 begat Zilog clone Z80, which was binary-compatible with 8080A and greatly extended the instruction set.

Intel's x86 line were alleged to be "source code" compatible in that a suitable compiler could assemble 8080A source code into x86 assembler. This worked, sometimes. A lot of early PC apps were ported from CP/M in this way.

I can haz blog!
[ Parent ]

PC vs. XT (none / 0) (#67)
by fluffy grue on Mon Feb 11, 2002 at 03:13:51 PM EST

The PC used the 8088. The XT (the "high-end" system) used the 8086.
"Is not a quine" is not a quine.
I have a master's degree in science!

[ Hug Your Trikuare ]
[ Parent ]

64K of RAM could set you back several thousand :) (4.00 / 2) (#31)
by joegee on Sun Feb 10, 2002 at 08:29:03 PM EST

Ask anyone who bought an Apple II or a TRS-80 Model IV. :)

With everything else you're dead on. Around 1982 as I recall the processor for the TI-99 4A was the only commercially-available machine that had a 16 bit CPU. Apple, Commodore, and Atari used the eight bit 6502, Tandy used another 8 bit CPU (I do not remember the maker/model) in its Color Computer, and the Z80 was used in Tandy's "business" machines.

"Cheap" (sub $400) computers like the Timex/Sinclair 1000 and the Commodore VIC-20 shipped with between 1K and 5K, but could be expanded up to 16K.

<sig>I always learn something on K5, sometimes in spite of myself.</sig>
[ Parent ]
All depends on when (none / 0) (#51)
by localroger on Mon Feb 11, 2002 at 08:35:36 AM EST

A full set of RAM chips for my first computer (16K) was about $150 in 1978. 4-supply 4116's which had a tendency to explode (literally!) if the 5V supply came on before +12. Before that the price was even higher, of course, and after that less. I was thinking circa 1982 but of course you're right too, if you back up to '75 or so.

I can haz blog!
[ Parent ]

16-bits and Tandys (none / 0) (#58)
by Mr Z (The Z is silent) on Mon Feb 11, 2002 at 10:05:47 AM EST

In 1982, there were three microcomputers out with 16-bit CPUs: The TI-99/4A (using the 9900 CPU), the IBM PC (using the 8088 CPU, a 16-bit CPU with 8-bit bus), and the Intellivision video game system (using the little-known General Instruments CP-1610 CPU). Ok, that last one wasn't a computer unless you bought the ECS attachment.

As for the Tandy computers -- their TRS-80 Color Computers used Motorola 6800-series CPUs. I believe the CoCo 2 and 3 used 6809s -- can someone correct me if I'm wrong?


[ Parent ]
6809, that's right :) (none / 0) (#79)
by joegee on Tue Feb 12, 2002 at 11:03:42 AM EST

And it had a lovely register you could POKE with a value to double the speed of your machine. :) I forgot about Intellivision, and the 8088 based IBM machines were so expensive they were practically nothing more than rumors. :)

<sig>I always learn something on K5, sometimes in spite of myself.</sig>
[ Parent ]
My pop had a TRS-80 Model IV... (none / 0) (#74)
by demi on Mon Feb 11, 2002 at 10:15:50 PM EST

Once it was retired from his business (they got a VAX to replace it) I played around with it for a time. Basically my first computer, but there was nothing fun at all to do with it except enter new accounts receivable. I think the price tag on that beast was about $6,000 with a printer and 15 MB hard drive in the early 1980's.

[ Parent ]

on the complexity (3.00 / 3) (#32)
by turmeric on Sun Feb 10, 2002 at 08:46:38 PM EST

when i was a 16 year old i really didnt
know a bit from a byte, and i was trying
to learn Assembler on the only machine
i had, the IBM PC.

now, i got borlands 'book of assembler instructions'
for Turbo Assembler, and i got 'teach yourself
assembly', and i got 'assembly language for
pascal programmers', and basically, i had no idea
that there was some 'other assembly language' out
there, i didnt know CISC And RISC from a hole
in the ground, i never met anyone who knew
unix until i was in college 1500 miles away
from my hometown.

the point is, that joe blow programmer doesnt
know what 'complex' is, if she/he has never
seen anything else, and she/he just knows that
if you do this and that you can get shit done.

i remember a chapter in my Compute Architecture
book from college that describe how crappy
the intel arch is (like the stack based
floating point shit for example) and how
there was so much better stuff out there,
but this pervasiveness and appeal to the
masses and backwards compatibility
made it last for 20 years. . . .
i mean i didnt find any 'sparc assembly'
books at the bookstore, and i didnt find any
RISC machines that i could learn programming

it is not unlike unix, which has some really shitty
things about it but due to some really nice
things about it has lasted 30 years... and
once i got in college i dont really have the
chance to play with windows NT kernel programming
or with any of the fancy stuff out there, i had
free linux and free unix to work with.

seems as though intel is 'reversing' its old
strategy of 'worse is better' it used all
through the 80s and 90s, and AMD is picking it up.

[ Parent ]
Meta: let the damned text box wrap for you. (3.75 / 8) (#35)
by Inoshiro on Sun Feb 10, 2002 at 09:31:49 PM EST

Please don't hit enter an the edge of the text box. It will automagically text wrap for you. There is no need to force your comment to 4 columns wide on my 256 column display -- it just makes it harder to read. Thank you, that is all.

[ イノシロ ]
[ Parent ]
Meta: let the damned web site wrap for you. (none / 0) (#75)
by nakaduct on Tue Feb 12, 2002 at 12:58:26 AM EST

Please don't design a web site that passes textbox-aligned line breaks through by default. Automagic detection and removal of same has been implemented by (among others) Microsoft and s/(?:^|(?<=\n))(.{30,55}?)[^\S\n]*\n(?=\S)/$1 /g. Thank you, and that is all.


[ And no tag-related guff about that regexp, I said 'by default' ]

[ Parent ]
Meta: mail the patch (none / 0) (#83)
by Inoshiro on Wed Feb 13, 2002 at 08:40:18 PM EST

I'm the sysadmin, not the perl monkey. Send to Hurstdog, Panner, or rusty.

[ イノシロ ]
[ Parent ]
Wrong, Wrong, WRONG! (3.75 / 4) (#34)
by Talez on Sun Feb 10, 2002 at 09:09:15 PM EST

At this point the astute reader will note that I said the machines only had two gigs of RAM, and a 32-bit architecture should give you four gigs of address space.

A 32-bit address bus will limit you to 4 gigs of memory and, as most astute readers will know, size of architechture != size of address bus. Two examples of this are the 8086 (16-bit architechture, 20-bit address bus) and the Pentium Pro (32-bit architechture, 36-bit address bus).

Just because everyone is used to it doesn't mean its set in stone... geez...

Si in Googlis non est, ergo non est
Nothing wrong, just read carefully (4.50 / 2) (#40)
by BlowCat on Mon Feb 11, 2002 at 12:13:10 AM EST

I think it was clear that the author means 4 gigs of unsegmented, easy-to-use memory for a process, not 4 gigs for the whole processor. The author actually mentions it, but not in "architecture" terms, but rather from the software point of view:

First, notice that the base address stored in the memory protection table doesn't necessarily have to be the same size as the registers of the CPU.

The thing is, if your segments are 4 gig and you want to work with 8 gig, you should change segments on the fly back and forth.

Thanks for the information about Pentium Pro, but you critisizm is not really justified.

[ Parent ]

Why stop at 64bits? (4.00 / 1) (#41)
by redelm on Mon Feb 11, 2002 at 12:40:35 AM EST

Why not 128bit CPUs that can read an IPv6 address in one machine word? Or more?

Because there are real costs to longer machine words. Die & machine complexity for one, and code density [size] for another. It already takes longer to fetch once-thru code than to execute it.

There are cases like your render farm, or database farms that can profitably use 64bits. But they are only a few thousand farms with a few million machines total. The bulk of the machines are user PCs which have no foreseeable need for 64 bits. Truth be told, they don't even need 32bits. The main reason that most [MS-Windows] apps are 32 bit is that MS withheld Windows95 certification form any apps that wouldn't run under MS-WinNT [32 bit]. Yes, I program in ASM & hate segment fixups. A real PITA.

I had an 4.77 MHz 8088 with 512 kB RAM. It was huge and hard to fill, at least in text-mode. That was ~10 MHz/MB. Later I had a 486/33 with 4 MB RAM. It was easy to fill doing Linux kernel compiles or graphics. It was ~8 MHz/MB with a much (~8x) improved instructions-per-clock. Now, I have a 1.2 GHz K7 with 1 GB for only 1.2 MHz/MB at still better IPC. I'm 'way long on memory, just like 8088 days.

I just don't see anything that would drive 64bit in the mass market. No killer app. There may be some marketing fluff. AMDs x86-64 is likely to win because of code compatibility. I see IA64 heading the same way as i860 & i960 [Intel's 64bit CPUs from 10 years ago]. Niche products.

640k is enough for anyone! ;) (4.00 / 1) (#42)
by mech9t8 on Mon Feb 11, 2002 at 01:49:12 AM EST

The main reason that most [MS-Windows] apps are 32 bit is that MS withheld Windows95 certification form any apps that wouldn't run under MS-WinNT [32 bit].

Dunno about that. Win16 apps aren't pre-emptively multitasked, aren't in protected memory, don't have access to Win32 niceties like long file names or Win95 widgets. They might not have specifically thought they needed 32-bit memory access (since the Windows libraries handled all the memory stuff anyway), but there were plenty of compelling reasons to go to Win32 besides a little "Windows 95 compatible" logo.

Now, Win64 doesn't offer any niceties like those - it's just Win32 with 64-bit extensions. So, initally, consumers will get 64-bit chips without any software to take advantage of them - just like early adopters of 32-bit chips.

But, eventually, consumer software will take advantage of 64-bit instructions if they're there. Video editing applications are an obvious example (being able to cut and paste whole segments of DVD or HDTV video without accessing the hard drive), but I'm sure others will come up. But 64-bit chips, like 32-bit chips, will first appear in the market not because of their memory architecture, but because they're simply going to be the fastest chips available. High-end servers need 64-bit, and it'll be cheaper to just let that technology trickle down to consumers instead of developing separate 32- and 64-bit chips for the different markets.

[ Parent ]

64-bitness (3.00 / 1) (#43)
by twodot72 on Mon Feb 11, 2002 at 01:58:22 AM EST

The die size penalty is not that great in the case of AMDs x86-64, less than 10% larger compared to the current 32-bit Athlon core is a figure I've seen, which is not a huge problem since the Athlon is quite small with todays standards.

Even so, the danger is of course that you might lose clock speed; all operations have to be extended to 64-bit, which might increase the logic depth of things like the adder, but it seems like they have some solution for that if you look at projected frequencies of future x86-64 products.

While I think user PCs will eventually have use for more than 4 Gigs of RAM, I do think you are right on the spot when calling IA64 a "niche product", at least for a long time to come. For now, it is a large and power-hungry beast.

[ Parent ]

64 Bits Needed Now! (none / 0) (#55)
by Bad Harmony on Mon Feb 11, 2002 at 09:19:29 AM EST

I just took a quick look at pricewatch. 4 GB of DRAM (PC1600 DDR 256MB) is selling for about $800. That is damn cheap, and it will get cheaper. Historically, I've spent about $500 on average, per PC, for DRAM. It won't be long before that $500 will buy 4 GB.

If the RAM is available, and cheap, someone will figure out how to take advantage of it to increase system performance.

5440' or Fight!
[ Parent ]

Binary compatibility (3.00 / 1) (#47)
by Paul Johnson on Mon Feb 11, 2002 at 05:53:17 AM EST

Well, there goes binary compatibility for the next generation of PCs. Well, maybe the next-but-one for home/desktop PCs.
There are going to be two competing architectures, plus the old one still hanging around. That means 3 copies of anything executable and all the configuration headaches that go with it. I recall the old days when we transitioned from Sun 3 (Motorola 68030) to Sun 4 (SPARC). Admittedly disks were much more expensive in those days so we had /usr mounted over nfs. But its still going to be a headache.
At present in our shop I compile things only twice (once for BSD, once for Linux). If we have 3 CPU architectures as well then I'm going to be compiling everything six times. Urgle.
You are lost in a twisty maze of little standards, all different.
not a problem (none / 0) (#64)
by ajaxx on Mon Feb 11, 2002 at 12:24:14 PM EST

ia64 can run ia32 programs. linux on ia64 can do this now (and has had the capability for quite some time). so can x86-64. calm down.

granted when ia32 gets phased out you'll want to be compiling everything for native targets. it's not like multiple builds are hard though.

$ for i in $TARGETS
> do
> mkdir $i
> cd $i
> ../src/configure --target=$i # ...
> make
> cd ..
> done

VPATH is your friend. plus by the time you need to do separate builds CPU speed will likely have stepped up to compensate.

[ Parent ]
All good except the various LInux details (3.00 / 1) (#48)
by Delirium on Mon Feb 11, 2002 at 05:58:00 AM EST

This was a great article, except the Linux memory-handling details seem a bit out of place. Sure, Linux is implemented poorly in that it reserves half the address space for itself; this is not an inherent limitation of the hardware. And I can hardly care less which particular system calls are the reason for this limitation...

You had the source so.... (3.00 / 2) (#49)
by maroberts on Mon Feb 11, 2002 at 06:28:37 AM EST

Regarding the graphics rendering farm I'm surprised that you didn't take a number of steps to help your situation.

a) why didn't you modify the 2GB/2GB kernel/ user split to say 1GB/3GB which would have certainly helped your memory

b) Its not beyond reason to write a new memory allocation routine. I've not tried it for Linux, but I have been involved in writing memory allocators for embedded Real Time(680x0) operating systems (VRTX) because the library versions weren't fast enough and weren't able to trace memory leaks.

The Linux source is meant to be available to make changes such as this.

P.S. Thanks for an interesting article.

The greatest trick the Devil pulled was to convince the world he didn't exist -- Verbil Kint, The Usual Suspects
Actually, I did (none / 0) (#59)
by MK77 on Mon Feb 11, 2002 at 10:13:00 AM EST

I did reconfigure the kernel to use only one gig of address space, and I also bumped up the base address for memory mapped regions so that the sbrk heap had more space. It did help, but there were still some problems.

Also, I was hired as a programmer there, and reconfiguring the kernels on all the machines made the system administrators uncomfortable, even after I explained it to them.

I thought about writing a custom malloc that always allocates memory from mmap'ed regions, never from the sbrk heap. Small regions would be managed by splitting up a page into small chunks. I never did this though, but if I had, we could have reconfigured the kernel again to make things much better.

Mmm... rageahol
[ Parent ]

Thanks (none / 0) (#54)
by mcherm on Mon Feb 11, 2002 at 09:16:48 AM EST

Thanks... educational article!

-- Michael Chermside
64-bit coding (none / 0) (#56)
by pattern on Mon Feb 11, 2002 at 09:25:47 AM EST

Ah, I remember messing with a Multia Alpha five years ago. Debugging was quite a trip when an uninitialized pointer would give you:

$1 =(char *) 0x974a23f083d1be4f

and an uninitialized integer would give you:

$2 = 12938745834758932001


Coincidence or what? (none / 0) (#60)
by dollyknot on Mon Feb 11, 2002 at 10:22:24 AM EST


I wrote an essay discussing the philosophical implications of 64 bit architecture, uploading it last Thursday. I also mentioned Shrek.

Link to my essay

Speaking for myself, I just wished they had scaled up the 6502 architecture. Ah dreams of a gig zero page.

They call it an elephant's trunk, whereas it is in fact an elephant's nose - a nose by any other name would smell as sweetly.

plate o' shrimp (none / 0) (#87)
by erp6502 on Sat Feb 16, 2002 at 12:23:21 AM EST

Some would contend that the 6502 was the first RISC architecture. I like to think of the X and Y registers as the hands and the Accumulator as the mouth of the CPU.

There've been some faithful extensions of the 6502 to 16 data / 24 addr bits, most notably by WDC (no, not Western Digital).

And you can bet that when I've got a bit of down time my next Verilog core is going to be a stripped-down 32-bit 6502, as you might guess from my uname. c(^8

[ Parent ]

mail me (none / 0) (#88)
by dollyknot on Sun Feb 17, 2002 at 01:44:02 PM EST

They call it an elephant's trunk, whereas it is in fact an elephant's nose - a nose by any other name would smell as sweetly.
[ Parent ]
8080 was the first single chip CPU (2.00 / 2) (#61)
by peace on Mon Feb 11, 2002 at 11:00:23 AM EST

In the beginning, there was the 8088. Intel said, "Let there be a one-chip CPU." And there was a one-chip CPU, and it was good.

The first one-chip intel CPU was the 8080. You needed an external timing circuit, but I think you needed one for the 8086 as well. It had a 16 bit address bus and 8bit databus. The accumulator was 8bits and then it had register pairs BC, DE and HL as well as a dedicated stack pointer. Each pair could be used as an 8 bit single register, e.g. B or C, or combined for use with some instructions, eg. BC. The HL ( high/low ) register had some basic indexing capabilities. DE had some basic math instructions, no mulitplication or division though.

Then they came out with the 8085. The 8085 was a step back in single chip design as it needed another chip to decode it's multiplexed address/data bus. The databus and the lower 8 bits of the address bus ran over the same pins at diffeent stages of the timing cycle. One pin was dedicated to communicating with the external multiplexer/demultiplexer. What this did was free up the 8 pins that were dedicated to the databus in the 8080. They used the free pins for hardware interupts as well as an on chip serial interface ( The only added instructions over the 8080 were RIM and SIM for reading and sending serial data ).

Someone mentioned the Z80 as compared to the 8086. The Z80 was Zilogs clone and extension of the 8080 ( no RIM and SIM, no multiplexed data/address bus). It added a few huge instructions like multiplication/division and indexed addressing.

It might be a little nit pic but the 8085 was the first CPU I ever owned. I had a Tandy102 portable and used to program it in assembly. It's how I became interested in programming so I feel compelled to stand up for it.

Kind Regards

4004 (none / 0) (#63)
by cameldrv on Mon Feb 11, 2002 at 11:21:04 AM EST

The Intel 4004 was the first single-chip microprocessor that I know of. I have heard various rumors that there was a microprocessor developed for the military that predates the 4004 but was part of a classified project, however I don't have any documentation on this.

[ Parent ]
The 4004 was not single chip (none / 0) (#65)
by peace on Mon Feb 11, 2002 at 12:51:16 PM EST

I knew I should have at least mentioned the 4004 and 8008. The 4004 was the first general purpose programable "CPU", but it was not a single chip design, it was a cluster of chips. My book on the making of the 4004 and other early CPU's is packed away or I would give you the chip designations.

Kind Regards

[ Parent ]

Fair enough. (none / 0) (#90)
by cameldrv on Thu Feb 21, 2002 at 01:54:54 AM EST

Thanks for the info.

[ Parent ]
Z80 (4.00 / 1) (#70)
by nusuth on Mon Feb 11, 2002 at 05:29:19 PM EST

Z80 had no multiplication or division instruction. It had 9(*) new 16 bit registers, (A'F',B'C',D'E',H'L',SP', IX, IY, IX', IY') and a huge number of new instructions (253*2 IIRC.) None of the new instructions did anything that a 8080 couldn't do; their base instruction was same, but their operands were different, and those operands had a new addressing mode too.

*: I'm not sure whether IX, IY and SP actually had shadows. IC definetly didn't, so it is quite probable SP didn't have one either. IY may actually be IX's shadow. It has been 14 years since I wrote my last Z80 program, so my memory is a bit fuzzy.

[ Parent ]

no. (none / 0) (#71)
by porkchop_d_clown on Mon Feb 11, 2002 at 07:07:09 PM EST

AFAI remember, the 8008 was the first single chip CPU. The 8080A was the first widely used one, though.

The Z80 was sweet, though - the first chip I learned machine language for. After that, the 6502 seemed like a toy.

When ruling an evil empire, keep in mind that no matter how attractive that captured rebel is, you can probably find someone else who doesn't act
[ Parent ]

Z80 = yech (none / 0) (#80)
by ebh on Tue Feb 12, 2002 at 02:36:24 PM EST

The 6502 may not have had many registers, and may have required a bunch of instructions to get anything done (TAX and TAY, anyone?), but after over a year of Z-80 assembly programming I was still looking in my quick reference to see if such-and-such an instruction could use so-and-so register.

If you don't believe me, ask anyone who ever had to write a compiler back-end for that POS.

And those looping instructions were nothing but bloody-hard-to-debug microcode bloatware.

All in all, give me a 6809.

[ Parent ]
8080, 8080A, 8085 8 bit; 8086, 8088 16 bit (none / 0) (#81)
by morganw on Tue Feb 12, 2002 at 09:37:33 PM EST



The 8080 and 8080A were messes. They required 3 power supplies (+12, +5 and -5), an external clock controller 8224 *and* a control bus encoder/decoder 8228.

The 8085 was an 8080 with the 8224 and 8228 rolled in. It was still an 8 bit chip and it did have a multiplexed address/data bus. I guess a latch was simpler/cheaper than an 8228 though. Also, Intel made RAM, ROM and peripherals that had latches built in, so you could build a teeny microcontroller with 3 or 4 chips.

The 8086 was the 16 bit external data bus version & actually came first. The 8088 was the 16 bit internal (registers/datapath), 8 bit external data bus part that was used in the PC. The 8088 might have been a "step back" but that would be due to its data bus width 'cause BOTH of these parts had multiplexed busses- 8086 had the first 16 address lines muxed w/ data, 8088 had only the first 8 (these were 20 address line parts).

The 8088 also had support chips with built in latches (they might have even been the same ones used with the 8085)- Steve Ciarcia used 'em in an article for Byte called "Ease into 16 bit Computing."

When remembering the 1st single chip CPUs, don't forget the first CMOS one- the RCA COSMAC 1802!

[ Parent ]
286 vs 386 protected mode (none / 0) (#62)
by pb on Mon Feb 11, 2002 at 11:04:36 AM EST

If I remember correctly, (and I'm sure many people here can correct me if I'm wrong :) the real reason 286 protected mode didn't catch on was because it was missing some basic mechanisms for managing protected mode. There was no good way to get back into real mode after entering 286 protected mode, and no decent way to run some real mode apps under 286 protected mode.

For DOS, at the time, this was a real show-stopper; you'd basically have to write a 286 version of DOS. I think that Windows did have support for 286 protected mode, but it tried to take over your machine anyhow. :)

Anyhow, with the introduction of the 386, things got much better in that there was support for ending protected mode and returning to real mode, and also they added the v86 stuff so you could run your real mode apps under protected mode. And that's why everything from Linux to Future Crew Demos used it, not to mention the massive speed improvements from rewriting code for a native 32-bit architecture...

I think this is all basically correct, but feel free to nitpick, guys...
"See what the drooling, ravening, flesh-eating hordes^W^W^W^WKuro5hin.org readers have to say."
-- pwhysall
286 Protected Mode (5.00 / 1) (#72)
by StrangeQuark on Mon Feb 11, 2002 at 09:50:08 PM EST

Actually, IIRC it was designed to prevent you from returning to real mode. After all, in a protected operating system, you generally wouldn't want to return to real mode -- that would be security risk (of it's a privledged instruction but still.)

Besides, programmers found a way around it. The processor would go into protected mode to access the high memory, copy it into the EMS buffer, set a flag, then they would force the processor to do a warm-reset. A segement of the bios startup would check to see if the flag existed in memory -- if so then the processor was coming back from protected mode and as such, return normally back to the calling program.

I'm sure there's a lot more detail than that -- but suffice to say you had >1meg memory on a 286 under DOS.

Ahhh, those were the days with my 286.

[ Parent ]

Why so few registers for AMD? (none / 0) (#73)
by Toojays on Mon Feb 11, 2002 at 10:02:05 PM EST

Can someone tell me why AMD are only adding 8 registers, when Intel are moving to a processor with 128 regs?

I haven't seen the IA-64 instruction set, but I doubt it could be worse than x86. From an assembly programming perspective, the choice between 128 regs and 16 regs is a no brainer, especially if the 128 register CPU has a cleaner instruction set.

Registers (4.00 / 1) (#82)
by twodot72 on Wed Feb 13, 2002 at 12:35:18 PM EST

I remember reading somewhere that AMD came to the conclusion (through simulations) that adding more registers gave a marginal performance improvement. At the same time, a larger register file is likely to be slower, and therefore might negatively effect the clock speed.

There is also a differece in the instruction set architecture: RISC-like ISAs generally make more intensive use of registers by design; for instance, the x86 instruction set has many more adressing modes than your average RISC ISA, this is one of the reasons why you don't have to use as many temporary registers. The RISC way has its advantages, but AMD has to work with the x86 ISA and therefore doesn't need to add as many registers as they would for a RISC machine (the Itanium ISA is similar to RISC in this respect).

Of course, from an assembly programming perspective it is easier to be able to waste away with registers; but not many people need to do assebmly programming on these processors anyway, at least that is not the target audience.

[ Parent ]

64 bit architecture is already in the marketplace (4.00 / 1) (#77)
by paulcourry on Tue Feb 12, 2002 at 03:10:10 AM EST

Let us remember that HP, Sun, IBM and others already have a sizeable installed base of 64 bit architecture hardware with 64 bit operating systems, compilers, libraries, databases and all the goodies. These are called servers, aka minis, mainframers and other words. The IA-64 chip already has a future at HP in their servers.

This article is about bringing 64 bit architecture to the masses.

VLIW please (none / 0) (#78)
by maroberts on Tue Feb 12, 2002 at 07:29:00 AM EST

Its about time that a sensible processor layout, with a register set that does not do your head in every time you look at it takes over from the mishmash that is 80x86 register layout.

Personally I always regard it as a damn shame that the original PC wasn't built around something like a 68000, which internally was 32 bits from the word go, and which in the main had general purpose registers [it wasn't perfect, but true RISC was still a few years away]. I regard 80x86 assembly as the bane of my life, occupying a number of dark years from 1985 to about 1990.
The greatest trick the Devil pulled was to convince the world he didn't exist -- Verbil Kint, The Usual Suspects
Looking a little farther into the future (none / 0) (#84)
by norge on Wed Feb 13, 2002 at 09:14:15 PM EST

Processor design is a process that involves many engineering tradeoffs. Some of the many metrics that processor architects have to keep in mind: die area, power consumption, clock rate, latency, op rate. For the vast majority of the history of processor architecture, die area was a major limitting factor. Increasing die area and transistor density is one of the reasons why we have seen processors move from 8 to 16 to 32 to 64 bit words. With more transistors a processor can do more computation in parallel and one of the easiest ways of realizing this parallelism is by increasing the word size.

As a few people have pointed out in this thread, 64 bit words are starting to get a little bit ridiculous. Being able to address 2^64 words of memory may be useful; being able to address 2^128 words of memory is clearly ludacris (obviously it's dangerous to call a capacity increase ludacris in the computer world, but whatever).

The questionable utility of these huge words is indicative of a more general problem that processor architects have been facing over the past ten to twenty years: too many transistors, too few useful things for them to do. We have seen architects try to engineer away this problem first with pipelining, and later with superscalar and VLIW designs. These techniques have proven remarkably useful, but Moore's "Law" shows no sign of faltering in the next five to ten years and the most recent designs (Pentium, Alpha, Sparc, etc.) are beginning to struggle mightily to keep their pipelines filled.

What does an architect do when transistors are so cheap and useful work for them is hard to come by? One design direction is to make smaller, cheaper, less power consumptive chips. We have already begun to see a trend in this direction and I expect that small low power chips will become more common in the future.

However, there will always be computer users who want more power. What is wrong with current systems that starves processors of useful work? I think the answer is good old assembly language. Almost all assembly languages used widely today are based on the idea that instructions execute in a strict sequential order. Of course, modern processors reorder instructions like crazy and modern compilers are designed to try to make it easy for processors to "discover" parallelism hidden in the sequential structure of assembly code.

I believe that there should be a new language for talking to processors that facilitates parallel computation. Obviously parallel execution was one of the main motivations behind the EPIC project from HP and Intel. However, I think that VLIW-like instruction sets are just a patch. Now we have two, three or four bits of computation per instruction, but the processors still has to maintain the fiction that the instructions are executing in order.

What is this great new language? I'm not sure. I do have some ideas: I have studied this problem for a couple of years in school and industry. However, my fingers are starting to get tired and it would take a long time to get into it, so I'll leave the question open. Anyone have any brilliant ideas?


So, What's that Bill? (none / 0) (#85)
by cpbell on Thu Feb 14, 2002 at 02:38:55 PM EST

Wasn't it Bill Gates who said "Who would ever need more than 640Kb"? I think it was. Things happen for a reason. Technology is growing faster and faster, bigger and bigger. You can not think about what you need, you have to think about what you can have.

Think outside the box,


[ Parent ]
What box are you talking about? (none / 0) (#86)
by norge on Fri Feb 15, 2002 at 02:11:57 PM EST

I think that throwing away assembly language as we know it is thinking pretty far outside the box.

On memory requirements and exponential capacity increases: For many decades computers have been following trends like "the number of transistors in a processor will double every 18 months". So we can reasonably expect that the amount of memory in a computer will follow some sort of exponentially increasing pattern. However, when we double the width of the words that computers use we *way* more than double the amount of memory that they can natively address. Some data:

bit width = addressable words
8 = 256
16 = ~64,000
32 = ~4,000,000,000
64 = ~16,000,000,000,000,000,000
128 = ~256,000,000,000,000,000,000,000,000,000,000,000,000

I am pretty confident that it will be many years before we have many programs that require more than 16 billion billion words of memory. I am not trying to suggest that we will never outgrow 64 bit addresses, but rather that they will not be seriously constraining for at least several decades.


[ Parent ]
I agree with that. (none / 0) (#89)
by cpbell on Wed Feb 20, 2002 at 08:54:37 AM EST


I agree. I wasn't implying that we need that much space right now... I was implying that we shouldn't think we will never use it.



[ Parent ]
A Tale of Two CPUs | 89 comments (76 topical, 13 editorial, 0 hidden)
Display: Sort:


All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!