Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
40 GHz superconducting chips available *now*.

By Christopher Thomas in News
Sun Dec 10, 2000 at 12:07:07 PM EST
Tags: Hardware (all tags)
Hardware

Superconducting microprocessors have finally arrived. This month's IEEE Spectrum has an article about the integrated fabrication of "Rapid, Single Flux-Quantum" devices - superconducting chips that implement conventional logic by using josephson junctions to transfer single quanta of magnetic flux.

At least one company - Hypres, Inc. - is already fabricating chips commercially. They contain 10,000 gates per square cm, run at 20-40 GHz, and dissipate about 300 *micro* watts. Another group - SUNY - promises 100,000 gates Really Soon Now.


The catch, of course, is that these devices run at liquid helium temperatures. However, their use would still be very practical in mainframes and supercomputers. Add to this the fact that you can design RSFQ logic with standard design tools, and we may be poised to bring Big Iron back into its glory days.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o article
o Hypres, Inc.
o SUNY
o Also by Christopher Thomas


Display: Sort:
40 GHz superconducting chips available *now*. | 15 comments (12 topical, 3 editorial, 0 hidden)
Beowulf Cluster... (1.34 / 29) (#2)
by k5er on Sun Dec 10, 2000 at 01:11:16 AM EST

Just imagine!
Long live k5, down with CNN.
It's a long way... (2.00 / 3) (#5)
by Luke Scharf on Sun Dec 10, 2000 at 11:46:51 AM EST

It's a long way from having a working lab-prototype to an actual production model. I wouldn't expect to buy something like this for at least five years...

Big Big Iron... :-)~



We've already passed the biggest milestone. (3.50 / 4) (#6)
by Christopher Thomas on Sun Dec 10, 2000 at 01:23:30 PM EST

It's a long way from having a working lab-prototype to an actual production model. I wouldn't expect to buy something like this for at least five years...

Having the chips in a non-vapour stage is a *big* improvemnt. The hard part has been done; the rest is just taking off-the-shelf CAD tools and off-the-shelf cryogenic equipment and putting together a server. Compare this to, say, qubit-based computing or photonic crystals, and you see what I mean (both may or may not revolutionize computing in 10-30 years).

A RSFQ-based computer built with today's chips wouldn't be as powerful as a conventional machine, but it could still be built fairly readily.

[ Parent ]
You make it sound so easy... (4.33 / 3) (#7)
by Luke Scharf on Sun Dec 10, 2000 at 02:05:24 PM EST

Having the chips in a non-vapour stage is a *big* improvemnt. The hard part has been done; the rest is just taking off-the-shelf CAD tools and off-the-shelf cryogenic equipment and putting together a server.

I agree - this is a major step that brings this much closer to reality.

But you've got to design a motherboard, design peripherals, build or port an operating system, and make it so that it won't break when the customer does something dumb. You've got to train people to build these things, to sell these things, to do technical support, and to make the machines do what customers will pay for.

Bringing out fundamentally different hardware of any kind takes time and effort. I see natural gas cars on the road, but how many of you actually drive one?



[ Parent ]
Re: We've already passed the biggest milestone (4.00 / 2) (#12)
by bemann on Mon Dec 11, 2000 at 08:01:05 AM EST

An RFSQ based machine would be much faster than conventional machines (40 GHz!), but would be not as dramatically fast due to the low gate density forcing stuff like branch prediction, floating point, and superscalar pipelining to be thrown out.



[ Parent ]
Performance of current RSFQ chips. (4.50 / 2) (#14)
by Christopher Thomas on Mon Dec 11, 2000 at 01:03:22 PM EST

An RFSQ based machine would be much faster than conventional machines (40 GHz!), but would be not as dramatically fast due to the low gate density forcing stuff like branch prediction, floating point, and superscalar pipelining to be thrown out.

Actually, it would be dog slow, for three reasons:

  • No on-die cache.
    You have no room for *any* on-die cache with only 10k gates. This means that your level-1 cache runs at a maximuim of 1-2 GHz off-die on conventional CMOS. This will make the processor _crawl_.

  • No multiply, divide, or other complex functional units.
    You *might* be able to fit an unpipelined shift-and-add multiplication unit - for integers - into a 10k gate machine, but probably not, with all of the stuff you'd need for general infrastructure before it. Forget about a division unit - your multiplication unit takes up the space if you have one. Most likely, integer mult/divide would be done in software, and heaven help you if you have to use floating-point for anything. This gives a *big* performance hit.

  • Simple argument of gates * frequency.
    The simplest argument: The amount of work that a processor can perform is (roughly!) proportional to the number of gates in the processor times the frequency. Even ignoring cache, you have at least 1 million gates in the core of a processor built with a modern fab. 1M gates at 2 GHz vs. 10k gates at 20 GHz gives a factor of _10_. Even without branch prediction and reordering, a conventional processor would kick the can of _current_ RSFQ chips. In a few years this may change, but not today.

It turns out that branch preduction and superscaling, while important, wouldn't introduce more than a factor of 2-3 or so slowdown. Typical instructions-per-clock under real use is about 1.5-2, no matter how many the processor _says_ can be issued (it's bound by cache, mainly, and somewhat by branch mispredicts). A simple in-order pipeline with a delay slot to mask branch latency could be built in 10k gates, and wouldn't be more than 2x-3x less efficient than an aggressive superscalar core, assuming both have decent cache (which the RSFQ chip *won't*).

[ Parent ]
Not Ready For Computers Yet, But... (5.00 / 2) (#15)
by sigwinch on Mon Dec 11, 2000 at 09:31:44 PM EST

Not everything needs massive amounts of gates and cache, but there are some things that need raw speed (for comparison, a photon of light travels only 7.5 mm between consecutive 40 GHz clock edges). Some ideas:

  1. Direct, real-time digital processing of radar signals. You could create a strangely time-shifted radar echo of a plane and screw with your opponent's radar.
  2. Direct, real-time synthesis, modulation, and demodulation of 2 GHz radio carriers.
  3. Brute-force discovery of cryptographic keys. At a key trial rate of 40 GHz, you can break 40-bit DES encryption in 13 seconds.
  4. Clock Turing machines and cellular automata at ridiculous rates, and search for their final states. I can't image what this would be useful for, but you could sure do it fast.
  5. Search for patterns in DNA sequences and similar information very fast. (But pipelined CMOS logic would probably be cheaper. I haven't thought this one through.)
  6. Real-time measurement of fast processes. Count every photon striking a photomultiplier tube, at rates of billions of photons per second, for a dynamic range of 1:10^9. That's a precision of 30 bits! I usually think of 22 bits as painfully precise.
  7. Directly control ludicrously fast processes. E.g., modulate the beam path for the next pulse of a mode-locked laser based on the previous pulse. This assumes, of course, that you can find a fast enough modulator.

--
I don't want the world, I just want your half.
[ Parent ]

Maybe I'm Missing Something... (4.33 / 3) (#11)
by Tim C on Mon Dec 11, 2000 at 05:55:46 AM EST

...but doesn't the story say that at least one company is already fabricating these things commercially? That doesn't sound much like a "working lab-prototype" to me...


Cheers,

Tim

[ Parent ]
It's a long way from a fabbed chip to Circuit City (5.00 / 2) (#13)
by Luke Scharf on Mon Dec 11, 2000 at 11:36:24 AM EST

..but doesn't the story say that at least one company is already fabricating these things commercially? That doesn't sound much like a "working lab-prototype" to me...

A couple of friends of mine are doing the layout for an experimental processor. It's getting sent to the fab in a week or so, and we should have 20 or so of them back a month or so after that.

This doesn't mean that this chip will be showing up in Circuit City next quarter. It doesn't even mean that it will be showing up in anyone else's lab for months. And this is being done in VLSI with standard transistors.

The company is fabricating devices with this kind of cool logic, but there's a lot more that goes into a computer than just a processor, and to make a machine "safe" for customers[0] to use seems tricky.

[0] In our lab, we managed to slow-roast a well-designed, well-supported, not-quite-experimental board. This particular board is evolved far enough that it didn't have any hand-rework on it, and consists of off-the-shelf parts. What we did to roast this board is the kind of thing that happens every day (multiple users opening and programming it simultaniously) in the real world, but that you might not think to guard against when you're just trying to make it work in the lab. A computer built out of RSFQ is a much bigger jump than some PCI card.



[ Parent ]
10,000 gates per square cm? (3.50 / 4) (#8)
by swr on Sun Dec 10, 2000 at 04:21:57 PM EST

Is it just me, or does 10,000 gates per square cm not sound very good? 100,000 still sounds like a long way from modern processors.

If I recall correctly, modern CPUs have tens of millions of gates in a package that is several square cm.



Gate density (4.25 / 4) (#9)
by Christopher Thomas on Sun Dec 10, 2000 at 04:59:11 PM EST

Is it just me, or does 10,000 gates per square cm not sound very good? 100,000 still sounds like a long way from modern processors.

This is correct. The milestone is that they've achieved integration at all.

The claims from SUNY suggest that they have a good handle on how to scale this up into something competitive.

[ Parent ]
But you don't need a "modern" gate densi (4.75 / 4) (#10)
by bemann on Sun Dec 10, 2000 at 11:44:52 PM EST

With a 40 GHz processor, you could probably get away with throwing away stuff like instruction reordering, microcode engines, floating point instructions, etc. With that amount of speed, you wouldn't need all that stuff which is necessary in conventional processors because it is either unnecessary or can be done purely in software (such as floating point). Parts of the processor such as the instruction and data caches could be kept as conventional dies. Condition registers and special condition branch instructions could be done away with and completely replaced with predication for all instructions (a conditional branch would just be a predicated normal branch). Register logic could be implemented conventionally because there probably will be a large number of registers, and they will probably be 32 or 64 bits each. You could just force compilers to completely take register access latency into account and just have the processor signal an exception if a register which isn't ready is accessed. This would get rid of logic that takes such things into account in current processors. You could have two sets of registers: superconducting fast registers actually used by most instructions and slow registers that can only be used in transfers with fast registers or memory. This would make it so the processor doesn't have to take normal register latency into account (the compiler would handle this instead). Note that transfers between slow and fast registers could be handled separately from everything else (for the most part), and could occur while unrelated instructions would be going on.

Note that the implications of this sort of architecture is that there would be no such thing as precompiled software - software would have to be compiled to take things like internal processor latency and such into account. Goodbye ISA - you'd essentially be programming processors in something not too different from *microcode*. The other things such as having to take latency into account would make trying to implement an ISA pointless, anyways.



[ Parent ]
40 GHz superconducting chips available *now*. | 15 comments (12 topical, 3 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!