Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Informative Pentium 4 rant

By itsbruce in Technology
Mon Jan 08, 2001 at 05:39:10 AM EST
Tags: Hardware (all tags)
Hardware

Darren Mihocka runs Emulators, Inc. He writes MacOS and Atari emulators for Intel hardware. As a result, he knows the internals of Intel chips (and those of their competitors) pretty well. His review of the Pentium 4 chip is devastatingly dismissive (the second section is headed "How Intel blew it").

This is more than an enjoyable rant, it's highly informative as well.


This guy doesn't hesitate to lay in when he thinks he sees sloppy design. Previous rants include Why not to buy a Pentium 3, Windows Millenium, WHAT A DISASTER!! and he's been strongly critical of Mac OSX and what he sees as Apple abandoning it's most loyal customer base.

He always backs up his rants with strong technical information and often also a historical overview of the subject in question. This is certainly the case here and he provides an excellently laid out history of the development of Intel (and, to a lesser extent, Athlon) processors from the 8088 onwards.

For balance, I should say that, though he quotes a critical review of the P4 on Tom's Hardware, Tom's final review of the P4 is more favourable. For my part, I'm convinced by Darek's argument that the Athlon's design philosophy of making existing code run faster is superior to the Pentium 4's requirement that developers write code specifically optimised for it. Ironically, Tom changed his review of the Pentium 4 (favourably) after Intel rewrote the test software to be optimised for the Pentium 4. As Darek points out, it will be several years before most consumer software has Pentium 4 optimisations - and why should developers be having to do that when the Athlon development path shows that processors can be designed to do the optimisation for us and run existing code faster?

One interesting sidenote is his very favourable review of the Crusoe chip's performance in his tests. Go here and page up.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
The best chip for the PC is..
o Pentium IV 2%
o Athlon 61%
o Crusoe 3%
o Buy a G4, for God's sake! 32%

Votes: 97
Results | Other Polls

Related Links
o Emulators, Inc.
o review of the Pentium 4 chip
o Why not to buy a Pentium 3
o Windows Millenium, WHAT A DISASTER!!
o final review of the P4
o Also by itsbruce


Display: Sort:
Informative Pentium 4 rant | 14 comments (7 topical, 7 editorial, 0 hidden)
Bad data on a few counts. (4.33 / 12) (#7)
by Christopher Thomas on Mon Jan 08, 2001 at 12:49:16 AM EST

Take this article with a grain of salt.

Mistakes I've found so far:

  • Citing premature P4 vs. Thunderbird benchmarks.
    Tom's initial benchmarks showed that the Thunderbird beat the pants off the P4. Later benchmarks showed that that was just because nobody has a decent compiler for the P4 yet. Revised benchmarks with optimized code on both fronts puts the P4 marginally ahead (though at a much higher cost).

    The critical importance of shipping compilers on time and distributing them far and wide is a big concern, but was completely ignored in the article.

  • Citing the P3 as being much slower than the Athlon.
    Simply not true. Again, this was in part a compiler issue. Code compiled for the P3 *with* SSE enabled vs. code compiled for the Athlon *with* 3Dnow enabled gives you more or less a tie. Both sides had solid chips.

  • Claiming that the PIII was a PII with a marketting gimmick added.
    Um, no. SSE was what MMX was *supposed* to be - a SIMD extension to the instruction set that was actually *useful*. Tom, Sharky, et. al. were getting about a 25% speedup in games vs. the PII, among other things. No other architectural changes? Well, if the new chip _works_, and performs better (due to SSE), where's the problem?

  • Sketchy understanding of RISC vs. CISC tradeoffs.
    The big advantage to RISC isn't that it's easier to increase the clock speed - it's that it's *MUCH* easier to pipeline the chip and to introduce superscaling. This is why virtually all chips - including the x86 chips - have RISC-like cores, regardless of the external instruction set.

  • Sketchy understanding of instruction load buffers.
    Grouping of instructions in later processors wasn't as important as the author thinks it is. In practice, you fetch many more than just the next two or three instructions. You keep fetching instructions 2 or 3 at a time no matter how many you execute, until you fill up the scheduling window, which is typically 16 or more instructions high. Instructions are selected and executed from this window arbitrarily (as long as dependencies can be satisfied).

    Yes, brain-deadness in the decoders in the later Intel chips make this a problem if you've been running the chip flat-out for several clocks, but I've seen nothing to indicate that he knows about the scheduling window at all.

  • Simplistic assumptions re. L1 cache.
    The author takes the "larger is always better" tack, but doesn't seem to realize that larger is usually also slower. Choice of cache associativity is also a difficult decision, involving substantial tradeoffs between hit latency and miss probability. This issue is completely ignored in the cache discussion, even though Intel and AMD have made very different decisions regarding it.

Most of the historical timeline in the article, OTOH, looks pretty accurate. There are also a few technical details about the later Intel chips that I hadn't seen before.

The architectural flaws the author notes int he P4 are, by and large, valid; however, he fails to offer conclusive proof that the chip's performance is really that abysmal (instead, he cites known-bad benchmarks).

yeah, well.. (3.80 / 5) (#8)
by wolfie on Mon Jan 08, 2001 at 05:21:35 AM EST

You call these "mistakes", while you're right that Darren M. has oversimplified a bit/exaggerated, you end up marginally proving the point of the k5 article. I only take issue with your first two bulleted points.

As itsbruce said:
>For my part, I'm convinced by Darek's argument that the Athlon's
>design philosophy of making existing code run faster is superior to the
>Pentium 4's requirement that developers write code specifically optimised
>for it.

And then you say:

>Citing premature P4 vs. Thunderbird benchmarks.
>Tom's initial benchmarks showed that the Thunderbird beat the pants off
>the P4. Later benchmarks showed that that was just because nobody has
>a decent compiler for the P4 yet. Revised benchmarks with optimized
>code on both fronts puts the P4 marginally ahead (though at a much
>higher cost).

I don't see how this is a valid point at all, computer hardware is pathetically cheap, compared to programmer cost of optimizing code and compilers, not to mention the time investment.

>The critical importance of shipping compilers on time and distributing them
>far and wide is a big concern, but was completely ignored in the article.

you haven't demonstrated here that this is a "big concern", and a big concern to whom exactly? sounds FUD-like.

>Citing the P3 as being much slower than the Athlon. Simply not true.
>Again, this was in part a compiler issue. Code compiled for the P3 *with*
>SSE enabled vs. code compiled for the Athlon *with* 3Dnow enabled
>gives you more or less a tie. Both sides had solid chips.

As I said above..


[ Parent ]
Compilers cost little compared to the developers. (4.00 / 3) (#11)
by Christopher Thomas on Mon Jan 08, 2001 at 11:44:24 AM EST

Tom's initial benchmarks showed that the Thunderbird beat the pants off the P4. Later benchmarks showed that that was just because nobody has a decent compiler for the P4 yet. Revised benchmarks with optimized code on both fronts puts the P4 marginally ahead (though at a much higher cost).

I don't see how this is a valid point at all, computer hardware is pathetically cheap, compared to programmer cost of optimizing code and compilers, not to mention the time investment.

Actually, no. The compiler license might cost the software houses $5k/seat for all of the bells and whistles. The programmers sitting at the terminals will cost 10-40 times that, depending on the length of the development cycle for whatever is being written. Using an obsolete compiler only makes sense when speed is not an issue.

Upgrading old software? Easy, just get the new version of your compiler and hit the "rebuild" button. Or, if you made a bad choice for your original compiler, spend a week moving the project to the new one (a drop in the bucket development-time-wise).

You don't have to make your software P4-only. It's straightforward to compile the speed-sensitive routines for multiple architectures and to switch between them at load-time. This is already done for MMX and 3dnow support.


Intel has dropped the ball here by *not* providing a good optimizing compiler by the P4's release date. Their compilers have historically been excellent. This would have solved most performance problems found in the benchmarks at no developer effort (as above).


The critical importance of shipping compilers on time and distributing them far and wide is a big concern, but was completely ignored in the article.

you haven't demonstrated here that this is a "big concern", and a big concern to whom exactly? sounds FUD-like.

With a good compiler, code on the P4 runs about twice as quickly as without a good compiler (as per Tom's benchmarks).

Seems pretty important to me.

It is strongly in the interest of both the chip manufacturer and developers to have good optimizing compilers ready at the release of any new architecture. It _is_ a big concern, as it directly and strongly affects the performance of all code written for the new architecture.

[ Parent ]
response (4.00 / 1) (#13)
by wolfie on Fri Jan 12, 2001 at 07:19:37 AM EST

>Tom's initial benchmarks showed that the Thunderbird beat the pants off
>the P4. Later benchmarks showed that that was just because nobody has
>a decent compiler for the P4
> yet. Revised benchmarks with optimized code on both fronts puts the
>P4 marginally ahead (though at a much higher cost).

>> I don't see how this is a valid point at all, computer hardware is
>>pathetically cheap, compared to programmer cost of optimizing code and >>compilers, not to mention the time
>> investment.

> Actually, no. The compiler license might cost the software houses
>$5k/seat for all of the bells and whistles. The programmers sitting at the
>terminals will cost 10-40 times that,
> depending on the length of the development cycle for whatever is
>being written. Using an obsolete compiler only makes sense when speed is
>not an issue.

I was referring to the resources spend optimizing the compiler. My wording was a little confusing, sorry.
Yes, I realize this is not a tremendously
important issue, still consider there will be *lots* of different compilers to be
optimized, etc which will progress at different speeds.



> Intel has dropped the ball here by *not* providing a good optimizing
>compiler by the P4's release date. Their compilers have historically been
>excellent. This would have solved most
> performance problems found in the benchmarks at no developer effort
>(as above).

I don't see how anyone can disagree with this statement.

> The critical importance of shipping compilers on time and distributing
>them far and wide is a big concern, but was completely ignored in the
>article.

Agreed, this is why having the chip run existing code better, rather than making existing code run better on a chip, makes a little more sense to me.

>> you haven't demonstrated here that this is a "big concern", and a big >>concern to whom exactly? sounds FUD-like.

>With a good compiler, code on the P4 runs about twice as quickly as >without a good compiler (as per Tom's benchmarks).

> Seems pretty important to me.

My apologies, I don't know what exactly I was thinking when I wrote this.
I must have misinterpreted what you said or something.


[ Parent ]
Obselete compilers (4.66 / 6) (#10)
by Spinoza on Mon Jan 08, 2001 at 07:16:13 AM EST

I thought this was a big part of what the article was driving at: Both of the last two Intel architectures have placed the load on programmers to optimise their code for the chip. That is assembly programmers had to think about grouping instructions to get the best performance, and for compiler programmers this is even more important. He seems to be sayig that MS VC++ still hasn't caught up with Intel's last generation, and indicates that this may be a sign of things to come. (Is Visual C++ a good example of the state of the art in compilers? Do other companies update faster?)

Saying the slowdown is caused by out-of-date compilers is neither here nor there. There is a slowdown (or rather, AMD is in the lead on performance. Don't construe that as "P4 is slower than P3"), and this will be the case until compilers are updated. When will this happen? Who knows? It certainly won't help me run existing software any faster, will it?

Your first point, on the thunderbird vs. P4 benchmarks pretty much mirrors what he said; code optimisation brought P4 into the lead. Of course, he also points out that few of us have Intel programmers willing to visit our houses and fix up our old programs.

[ Parent ]

I'm not so sure (3.33 / 6) (#9)
by StrontiumDog on Mon Jan 08, 2001 at 05:32:52 AM EST

... that Intel's decision to pursue clock speed over efficiency is such a bad idea of itself. Techies and hard-core gamers aside, the average consumer uses clock speed as the yardstick (if he has a yardstick at all) for estimating computer performance. Most computer users wouldn't know a benchmark from a birthmark, and if Intel can ship processors marked with a high enough clock speed, they will buy them in preference to processors with lower clock speeds but higher benchmark performances.

In my brief stint as sysadmin 5 years ago CPU speeds were absolutely the least of my concerns during computer buy-ins. Memory, price, reliability, warranties, and HDD speeds (in that order) were far more important. As a user nowadays, I attach more importance to the peripherals than the CPU speed. For a fixed budget it makes more sense to forgo the highest end CPUs and spend the money saved on a better video card, for instance. What's all the fuss about Intel/AMD chip performances?

why the design changes in PIV? (none / 0) (#14)
by lleukkun on Fri Jan 12, 2001 at 06:45:20 PM EST

I've seen a lot of stories about how the new Pentium IV is different in various ways but very few actually care to make an educated explanation of why these changes in the design were made. I don't claim to know too much about designing a processor but here are some thoughts and questions.

Take the very small L1 cache for example. If you think about how fast these things are gonna go three-five years from now I'd imagine its probably awful lot cheaper to have only 8k of L1 going really fast instead of having say 64k of it. together with pressure to make the overall size smaller I'd say it would be weird if they didn't make an effort of simplifying things as well (removing some of the execution units? not doing as much optimization work on the chip?). Now if the new cpu is only 30% slower today but allows 10x speed increases in the long run (or more) it sure is worth taking the hit. Unless someone manages to do it better but so far we have no hard evidence.

Many articles state that the PIII & friends are hitting the upper limit at ~1GHz. Why? If Athlon is so similar in design to PIII then why can it run with at least 50% higher speeds without problems? Just the manufacturing? that would be news. Intel has spent unknown amounts of money building its factories and then comes this startup and does it better...

BTW. If you know any links of relevance to this stuff I'd appreciate them. Preferably not the theoretical math intensive university variety though ;-)

So basically would it be safe to assume that Intel actually has a good solid product _fundamentally_ but was just forced to ship too early? I don't think anybody would complain about slight decrease in punch/cycle if the new PIV would have started from 2GHz upwards. Specifically open source users ought to be happy as soon as there is a gcc version with decent PIV optimizations they can rebuild their stuff and laugh at people stuck with their underperforming win stuff. Can't imagine anybody compiling Office 2000 but Emacs is actually very easy to handle and kernel compiles are really almost too easy.

Any ideas how the next generation AMD chips are going to compare in design against PIV and current Athlon? Are they doing the same stuff as Intel or do they have some magic ability to just keep on upping the speed of current (complex?) basic design? If they too are going the Intel route then all this ongoing Intel bashing is waste of bytes.

Informative Pentium 4 rant | 14 comments (7 topical, 7 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!