Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Review: Virtual Machine Design and Implementation in C/C++

By Sfivo in Technology
Sun Jun 30, 2002 at 07:42:59 PM EST
Tags: Books (all tags)
Books

I recently had the pleasure to read Bill Blunden's "Virtual Machine Design and Implementation in C/C++". It describes how he built the HEC virtual machine, and gives you a real grasp of how to accomplish this yourself.

HEC is a complete, 64-bit, register-based virtual machine. It's pretty straight-forward for anyone somewhat familiar with the x86 platform, and some assembly level programming. The entire book describes his design process and trade-offs that were made during it's implementation.


For those that are not of the "really-boring-tech" genus, a virtual machine is a pseudo-machine that runs in software, opposed to actual physical hardware. The Java runtime environment is an example of this, as well as traditional emulators. The trick is to provide a platform on which programs execute, similar to normal hardware.

I found this book more than helpful while writing Klea, my own yet-to-be-finished virtual machine, even though our goals are much different. HEC is a much lower-level virtual machine than Klea, and is register based, opposed to stack based. Still, a lot of the concepts are universal. Blunden explains everything pretty clearly, and HEC is more than I expected from an 'example' virtual machine.

The book leads with a very interesting forward, describing the uses of virtual machines today and in the past, and why it's generally a good idea. His slightly off-topic historical notes are some of the most enjoyable parts of the book. I'd suggest this as reading for anyone who really wants a good understanding on how some of the "black magic" of today came to be.

Mr. Blunden describes each instruction meticulously. HEC's instruction set is not clutterred with meaningless once-in-a-blue-moon operations, but a minimal, functional set that is more than adequate for it's purpose as a learning tool. He also spends a few paragraphs on the implementation of the not-so-obvious instructions. He suppliments this with many working examples of HEC assembly programs that demonstrate most of the functionality of HEC.

The author explains in detail how stack allocation works, in easy to understand terms, but a little vague and undecided about the heap. VM design isnt an exact science, so you can live with this. Most core memory management topics are covered in depth, with a lot of nice illustrations to clarify exactly what (should) be going on.

One of the more interesting chapters concerns Interprocess Communication (IPC). Several mechanisms are explored, and rated in terms of portability, speed, and ease of use, with a bias towards sockets. The TCP/IP protocol is dissected enough to cleanly enable an interface to be made to the virtual machine. Sockets, like many "higher level" operations, such as file I/O, are handled via an interrupt handler, which shows a powerful but simple method for the reader to extend HEC to allow more advanced functionality without overcomplicating the instruction set.

Interfacing the virtual machine to native code is a complex topic. He discusses various problems with calling conventions, conversions, and type compatiblity when crossing the barrier between various languages and the HEC VM. Although I thought this area was slightly over-engineered, as he used XML as the interchange format, it was informative. The important part is it shows the basics of allowing a low-level machine to "talk" to higher level languages through a consistant method. Unfortunately, This entire topic's implementation is worthless unless you're on Windows, or want to edit a few core HEC files. This, sadly, is one of the features that was stripped from the Linux version, as discussed later.

He then discusses debuggers, and walks through his implementation of a hardware-level debugger, embedded into the VM runtime. This is somewhat crude, but is very easy to understand from the viewpoint of someone that does not intimately know the codebase. The time spent on the debugger was pretty short and to-the-point, in a good way. I, personally, think the debugging section is the most well-written and informative chapter in the book.

The entire HEC toolchain consists of the VM, debugger, assembler, and a few assorted 'inspection' utilities. It's clear how they all work together in the system, but the example code is often times redundant, inelegant, and contain enough macros to piss off your preprocessor.

A lot of wheel-reinventing goes on in this book, and I thought it was a little off-track for it to invest more than a couple of pages into common, easy tasks like command line parsing. The lack of focus sometimes hurts more than it helps. The author tends to look really in depth into a problem, and then usually solve it in a completely wrong way.

It is impossible for me to be completely impartial about some of the design decisions, as I have studied this area intensely for the last year or so, and came to some really different conclusions. But nonetheless, these are old, ongoing, holy wars, so I was not quick to judge.

I was really disappointed that he swept kludges and shortcomings under the "next version" rug. Especially seeing that he has no apparent online location for errata, fixes, and code updates. I felt really short-changed: I was expecting this book to provide a lot more concrete examples, rather than string me on to the revised edition, possibly years from now. For example, a lot of the source code, even printed in the pages of the book, have fragments of a broken threading feature, but he instructed me to ignore that, as it'll only be useful in future versions.

There are also times where it seems he implemented obviously unacceptable limitations (relating to dynamic allocation, for example) for the sake of brevity. It appeared he didn't want to get into the core of any real solution, usually making excuses and sour-grapes statements along the way. This is not what I expected from a book that spent 20 or so pages describing his own vector and linked-list structures, and even more over-explaining how inturrupts work.

Overall, he paints a clear picture of what is involved in virtual machine design, from both a maintainance and performance point of view. He wonders off the path often, but it's readable, and pretty informative.

There are, however a few things I found slightly misleading, and outright ignorant presented in this book.

The first problem is it's title. The HEC virtual machine is written entirely in C. I found the title of "Implementation in C/C++" very misleading. This is true, however, because standard C is, by definition, valid C++. The HEC assembler was, however truly written in "C/C++", meaning "C++ using unsafe C standard library functions".

I was amazed as he exclaimed that the project was unmanageable in C, therefore required a rewrite in C++; only moments later to write his own std::list and std::vector(extendable array). This guy is obviously coming from a C background, and writes in sketchy C++ using cstdio and iffy re-implementations of proven (and portable) STL types. This is the point I lost a bit of confidence in the book. I'm not an object-oriented freak, but using the improved, safe C++ library routines make a lot of sense in C++, if for nothing else, the clarity of teaching.

In the last chapter, Blunden spouts off half-truths and blatant misinformation about Linux. I think this stems from his fear that "technology is moving out from under him", like he described in the first chapter concerning his transition between DOS and Windows. Blunden boasts HEC's portability goals throughout the book, but I think this was mostly market-speak: I don't consider a half-working feature-ripped Linux port true "portability" like he claims. He describes his painful attempt to port HEC to Caldera OpenLinux, and complains about "Linux's lack of wide-character support" after "man wprintf" fails. He then goes on to say that if any distribution does have wide character support, it shows a "fragmentation", which he says, is even worse.

I don't consider what he's experiencing fragmentation, but "picked the bad apple" syndrome. There's a reason Caldera basically has nil market share now. The command "man wprintf" and associated headers are in place and working in the default install of even ancient, Mandrake, RedHat, and Debian systems. I doubt he'd do a DOS port on DR-DOS, but the real, working DOS. Why he doesn't extend the same effort to Linux is beyond me. Porting on a failing, market-trailing, mostly broken implementation of an OS is not the best way to get started, especially if you have no experience with the platform.

He then went on and on about how Linux does not support loading arbitrary libraries at runtime, which is a lie at worse, or "wrong" at best. dlopen(), dlclose(), and dlsym() work perfectly for me, and are the exact equivalents to Windows' LoadLibary(), FreeLibrary() and GetProcAddress() calls. These functions are not new, nor Linux specific, they are part of the Unix98 standard, and come from Solaris. I found it odd that an "ex-Unix programmer" would be so naive to think that Linux, or any modern Unix, doesn't support what boils down to something as simple as plugins. He then goes on to show how to build a static library, claiming it's a dynamic library, and how it fails.

The best attribute I can give Mr. Blunden on this statement is "unresearched". I knew about the dl* functions years before I even saw a Linux desktop, while pondering porting a Win32 app to Linux. I don't expect authors to know every nuance of every platform available, but the book says "Includes ports of the HEC Virtual Machine for Windows and Linux" directly on the cover. In reality, you have a (possibly) working Windows implementation, and a half-assed, feature-stripped, but somewhat working Linux version, only because the author didn't put forth the effort to research the platform.

What's even more amazing is that I have implemented the missing Linux features in less than an hour (after removing more than a few ^M's from the source and build scripts), and considered sending him the changes, but his only point of contact for corrections is a physical mailing address. This is pretty unorthodox in the modern technical publishing environment. I'm not about to pay postage to ship him a CDR with a 2KB diff on it. And for some reason I doubt he'd eat his strong words and implement it anyway.

Also, I found it completely inappropriate and inflammatory to include the following passage, in a book about virtual machines, no less:

"Here is a canonical usability test. Take someone who has had minimal exposure to computers (i.e. your mother). Allow them to play with Microsoft WordPad and then give them a chance to fiddle with vi. WordPad will win every time. Normal people realize that having to memorize dozens of obscure commands is a waste of time, and they would rather interact with a tool that is intuitive and easy to work with. Naturally, there will be those members of the audience who think otherwise. They would say, "But, but, but ... vi has far more powerful features than WordPad." These are the same people that yearn for the gold old days of the 25x80 dummy terminals. The vi editor is nothing more than a relic from the pre-GUI era."

The absurdness of this remark alone led me to question the merits of the previous chapters. The first question I had is "Why not compare vi to DOS' edit?", and then I saw the real problem: "Why compare WordPad, a friendly GUI editor for Windows, to vi: and unfriendly command-oriented editor, instead of Linux's friendly GUI editors, such as KWrite, Kate, AbiWord or OpenOffice?".

This review isn't from the perspective of a rabid Linux user's perspective, but a guy that wanted to learn about virtual machines, and bought a book with "Virtual Machine Design" in the title. I was expecting a little more information about VMs, and a lot less spouting at the mouth about false Linux problems, and stupid apple-to-oranges comparisons, stemming from the author's fear of having to switch platforms again.

If you can put up with the opinionated, super-ego writing style (amazingly only the last few chapters), a few blatantly wrong technical errors every now and then, and more than a few unjustified (in my opinion) design decisions, I would still recommend this book, not for it's actual code or implementation, but because of the thought process, and overall big picture of how virtual machines work, and how they can be implemented.

I know this review has come off as pretty negative, almost overly so. I guess that is what happens when you end an otherwise fine book with flamebait that insults a decent percentage of your readers, who paid real money for a book about implementing virtual machines. I assume he does realize that his primary audience includes the technically-oriented crowd, a lot of which may be running Linux in some form or another.

The HEC virtual machine is a pretty primitive animal, but gave me some invaluable information in areas I was sorta "iffy" about before. I mostly got validation of the ideas I am using in Klea, rather than ground-breaking concepts, so I don't know how it'd treat someone who hasn't been studying virtual machines for while. That warm, fuzzy feeling of knowing I was on the right path was worth the $45 US I paid for it, anyway.

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o Also by Sfivo


Display: Sort:
Review: Virtual Machine Design and Implementation in C/C++ | 26 comments (12 topical, 14 editorial, 0 hidden)
BTW - I like the article (3.00 / 1) (#3)
by morkeleb on Sun Jun 30, 2002 at 08:16:27 AM EST

Although it brings to my mind that K5 could use a Book Review Section. I love book reviews. Sounds like this one is a dog, though. Ah well....


"If I read a book and it makes my whole body so cold no fire can ever warm me, I know that is poetry." - Emily Dickinson
Topic: Books (5.00 / 2) (#7)
by kisielk on Sun Jun 30, 2002 at 02:11:13 PM EST

Isn't that topic essentially what you are asking for? It's the one this story is posted under...

--
Talk, talk, it's only talk. Arguments, agreements, advice, answers, articulate announcements. It's all just talk."
- Elephant Talk, King Crimson


[ Parent ]
It's posted under Technology (none / 0) (#18)
by morkeleb on Sun Jun 30, 2002 at 10:10:02 PM EST

Correct? Unless I am seeing things. I don't see a Books section.
"If I read a book and it makes my whole body so cold no fire can ever warm me, I know that is poetry." - Emily Dickinson
[ Parent ]
No... (none / 0) (#26)
by kisielk on Thu Aug 08, 2002 at 09:40:59 PM EST

The section is technology, the topic Is book reviews.. if I am not mistaken. Notice the little book icon next to the story title ?

--
Talk, talk, it's only talk. Arguments, agreements, advice, answers, articulate announcements. It's all just talk."
- Elephant Talk, King Crimson


[ Parent ]
coincidence (3.00 / 1) (#4)
by zephc on Sun Jun 30, 2002 at 08:48:00 AM EST

I'm working on a virtual machine meant to (theoretically) distribute indefinitely across a network.  I'm writing it in a mix of C and C++ (the STL is SO handy!)  It's not meant to be exactly FAST on a single machine (it parses thru an AST rather than bytecode etc) but its supposed to more than make up for that (eventually) by said distributed computing.

Bad Signs (none / 0) (#17)
by dadams on Sun Jun 30, 2002 at 09:27:15 PM EST

It's always a bad sign when a book devolves into petty attacks on something essentially unrelated to its topic. I think my favorite case of this is William Taylor's What every Engineer needs to know about Artificial Intelligence, which has a length footnote detailing the author's hatred of MacOS software development.

I really wish there was an organization that would police books relating to software construction and tear out any wheel re-invention. It's just enforces this idea that it's completely acceptable to write another string class.



overly negative... (4.00 / 1) (#19)
by bhouston on Sun Jun 30, 2002 at 10:12:33 PM EST

Sfivo:
"I know this review has come off as pretty negative, almost overly so."

I think that you went into too much detail in regards to the problems you have with the book.  Do you think that the second half of the article in which you details many of the author's faults to be necessary?  I was interested in whether the book was good or not in regards to VM design and not in your detailed and supported errata for the book.

It was also strange that you got so personal in your attacks.  I do not know why you added these flurishes, them seem to distract from the article: "And for some reason I doubt he'd eat his strong words and implement it anyway."

Overall I find it difficult to come away with a clear picture of this book.  You recommend it but you also give me the impression the author is an arrangant and prone to error.

..but mostly justified. (4.50 / 2) (#21)
by Sfivo on Sun Jun 30, 2002 at 11:01:33 PM EST

I think that you went into too much detail in regards to the problems you have with the book.

As with most technical things, I tend to dwell on the weak spots, this is a fault on my part. I was, mostly, trying to point out that this book is, obviously, not flawless.

Do you think that the second half of the article in which you details many of the author's faults to be necessary?

Certainly. He makes some /technically/ wrong statements with authority, and I thought it'd be important to point this out. I tried not to personally attack the author, but some of the more off-beat parts of the book. All I have to judge this book by is it's text, which is pretty attatched to the author's person. It's sometimes hard to seperate the two.

My review really reflects my own experiences of the book. One that is not interested in the Linux port would probably not have a problem at all with the moaning and innaccuracies of the author.

I was interested in whether the book was good or not in regards to VM design and not in your detailed and supported errata for the book.

I thought I gave enough specifics of the text, which were mostly good points. VM design decisions can be a very personal decision, much like the C vs. C++ vs. Java arguments. I've decided that because this is a review, the only thing outside the text I'd have to offer would be personal opinions that may seem as unfounded as I found some examples in this book appear to be. I decided to stay away from that.

It's hard to give a good-or-bad answer to a lot of the issues brought up in the book, and I have tried to be unbiased and not complain when the author used methods different I would have, as his design goals were for the most part explained and deliberate.

So is it a good book about virtual machine implementation? Yes, with some caveats, which I thought I made clear.

You recommend it but you also give me the impression the author is an arrangant and prone to error.

This is exactly the impression I got from reading the book, and was trying to express. This could be effected by the lack of over-the-counter, easy reading books on the subject. I believe the positive qualities outweighed the negative, which is why you were presented with such a mixed review.

[ Parent ]

thanks for the response (none / 0) (#22)
by bhouston on Mon Jul 01, 2002 at 03:04:48 AM EST

I understand your point of view.  

[ Parent ]
I appreciate the review (5.00 / 1) (#23)
by curien on Mon Jul 01, 2002 at 07:53:24 AM EST

The review seemed very thoughtful, and if I were looking for a book on implementing VMs, this would certainly have helped in my decision. I do have one issue, though (pet peeve :)
The first problem is it's title. The HEC virtual machine is written entirely in C. I found the title of "Implementation in C/C++" very misleading.
Couldn't agree with you more! The phrase "C/C++" is often very misleading.
This is true, however, because standard C is, by definition, valid C++.
Ummm.... no. C99 and C++98 diverge *considerably*. C++98 even diverges from C90 (which is the version of C most similar to C++98) in many ways. Most of the differences are trivial (type conversion rules (C++98 is far stricter than C90), sizeof a character literal (==sizeof int in C90, ==sizeof char in C++98), etc). There are, however, a few major differences that would require significant rewrite in order to migrate from C90 to C++98. Please see this Usenet post for a concrete example.
The HEC assembler was, however truly written in "C/C++", meaning "C++ using unsafe C standard library functions".
The C library functions may be "unsafe", but some of them also tend to be faster.

--
Murder your babies. -- R Mutt
amazing (none / 0) (#24)
by tps12 on Mon Jul 01, 2002 at 09:02:02 AM EST

I'm repeatedly astounded at the stuff that gets published. From this review, I estimate that 20% of k5's readership and 5% of posters on comp.lang.* would be more qualified to write this book than the author.

OTOH, he has some killer troll material in there (vi vs. WordPad, wide character support).

Yup (5.00 / 3) (#25)
by ucblockhead on Tue Jul 02, 2002 at 12:35:22 AM EST

Speaking from experience...I wrote two books on C myself and my only qualifications were a high school degree and a willingness to say "I can write a book on C".

The amount I didn't know about C when I wrote those books is fucking amazing.
-----------------------
This is k5. We're all tools - duxup
[ Parent ]

Review: Virtual Machine Design and Implementation in C/C++ | 26 comments (12 topical, 14 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!