Unless I'm mistaken, this leaves the "purest" contender for the "RISC" throne the StrongARM, which was incidently also the -first- RISC chip ever built.
(Welll, ok, its predecessor, the ARM chip, was, but that's nit-picking.)
IMHO, the RISC concept of ultra-simplified Reduced Instruction Set Computers is by far the best design. Complex computers are more vulnerable to unexpected side-effects, chip bugs, etc, simply because you've got to check a much greater range of possible instructions and states.
In fact, IMHO, the "ideal" CPU would not be "central" at all, but rather consist of possibly hundreds of "trivial" processing elements, each essentially independent, apart from some basic communications mechanism.
Now, I've argued this before, for running JAVA applications at the hardware level, WITHOUT having any "virtual machine" layer. One of the counter-points was that the communications would become a bottleneck, essentially wiping out any gains the vastly simplified structure could achieve.
I'm going to answer that point. To have a bottleneck, you must have more data wanting to go through a specific channel than that channel can support. Much like the US road network, in fact. :)
However, let's imagine a 3-layer system. Layer 1 is memory. Main memory, cache, register stacks, etc. It's just high-speed memory, to be used as the processor needs.
Layer 2 is the hardware side of the processors. It won't be just one processor, it'll be as many processors as you can possibly cram onto the silicon. My guess would be you could fit around 1024 light-weight RISC processors onto a 3" wafer.
Layer 3 is a network layer, and is the key to having no bottlenecks. In essence, every processor element could have enough bus width to dump and load their entire state in a single cycle. This would effectively allow you to compose "complex" instructions that are as fast, if not faster, than ones on a CISC chip. It is easy to envisage a pipe switcher that would operate as fast as any instruction look-up system.