The recent brouhaha over Toyota's 2 billion dollar, 8 million vehicle recall raises some difficult questions about the underlying causes for apparently obvious problems like interfering floor mats, sticky pedal assemblies and buggy software. For me, the most interesting issue is the relationship between the value engineering of mechanical systems and the reliability of control software.
I first learned about value engineering in the automotive industry from an electronics engineer employed by one of the Big Three in Detroit. We were chatting between sessions at a computer industry conference and I asked him what his job entailed. He said his main task every year was to reduce the cost of the car radio by 5 cents. It was not a simple matter of cost cutting. The finished product still had to meet its specifications, which included mean time to failure. According to him and other automotive engineers I've talked to since, the concept of setting performance specifications and then incrementally reducing the cost of each component until it just barely met the specs is standard practice for every part of all mass produced cars.
The benefits of value engineering are clear: cars cost less today than ever before. Based on U.S. Census Bureau data, in 2006 a family with the U.S. median annual income of $58,407 would spend 22% of that to buy a new, $13,000 Chevrolet Cobalt. The same median income family in 1960 would have had to spend 36% of their $5,620 annual income to buy a new Chevrolet Corvair for $2,000. Arguably, the Cobalt is a better vehicle than the Corvair by almost any metric one might choose. In fact, I can't think of a single feature of the Corvair that is superior to the Cobalt unless it's styling, and that's subjective.
However, there are at least two drawbacks to value engineering: the margin between operational and failed is reduced, and quality assurance requirements are increased. In the case of the Toyota gas pedal assembly, there are two parts that could easily be adversely affected by value engineering: the pedal return spring, and the pin hinge that the pedal arm rotates around. We can examine it on the CTS Specifications page, here...
The spring appears to be a six-turn coil of steel wire. The return to idle position time is controlled by federal specifications, which you can read here...
The relevant part of the specification states that the pedal must return to its idle position in one second or less. No doubt, the pedals submitted for certification by CTS met that requirement. According to industry sources, CTS was under pressure from Toyota to value engineer the components of the pedal assembly such as the materials, spring and the hinge pin.
Of course, one way to save money on components is to look offshore for a supplier. The supplier of springs and/or pins might start out as American and end up as Chinese or Brazilian. Those suppliers would have to certify that the parts they propose meet the original materials and performance specifications. Unfortunately, as McDonald Douglas and Boeing found out the hard way, offshore suppliers are sometimes all too happy to fill out the compliance paperwork without much actual testing of the parts. The specification for the CTS pedal spring was for 3 million cycles. A cheap spring might not break after 100,000 cycles, but it could get very tired and slow.
The simple hinge pin at the heart of the CTS pedal assembly will probably never break. But what if the plastic around it isn't dimensionally stable? Toyota has admitted that the plastic guide fins and grooves on the pedal arm behind and above the hinge can develop an interference problem if the plastic has absorbed water. Does that indicate the specs for the plastic were weak or that the supplier didn't meet them? Whatever the reason, the increased friction affects the pedal return to idle position speed and adds stress to the spring, possibly shortening its useful life and making it more likely the pedal will stick.
At first glance, it would seem that these mechanical gremlins should have nothing to do with control software. The drive-by-wire system shouldn't be affected by a stuck pedal. The computer connected to it would consider the depressed pedal a valid go-fast command. But real-time control software is never that simple. The rate of change of the pedal position, both up and down was also factored in. The pedal position was sampled continuously and then processed to determine what the fuel system for the engine should do next.
Take samples, calculate a vector, look up a table, apply a control algorithm - it's pretty straightforward coding as software goes. But, there are pitfalls. For one thing, automotive systems that rely on a computer to manage the brakes and accelerator must meet a very high standard of reliability; ideally, a one in one trillion operations error rate. That is not an easy goal with software. The problem is verification. While there are strict, published test regimes defined for computer controlled weapons systems, digital media systems, telecom equipment and medical devices, the transportation industry keeps its methods a secret. The Society of Automotive Engineers has published a number of papers on this topic, but the extent of any manufacturer's implementation and their test results are not available.
My own experience with a subtle bug illustrates what the automotive software engineers are up against. In one of the digital audio effects programs I wrote, the left and right stereo channels would occasionally, less than once a day, swap sides. The bug was random and unprovoked. Why would real-time high-speed software driven by slow-speed human input swap audio channels for no reason? After a week of tedious debugging with a logic analyzer I had my answer: a software induced "race condition" in which two signals were so close in clock timing that one or the other might be in the lead at any given point, depending on random system noise or jitter.
The software I'd written used buffers to store sets of audio samples for processing left and right raw data. The digital signal processor could operate on both buffers in parallel, but it had to load its internal coefficient memories sequentially, not in parallel. The timing was so close that one in about 15 billion times the right channel audio sample buffer would be ready to feed the engine first, causing a swap. Though I'd studied the manual to learn the chip, I didn't realize I had to allow for that slight asymmetry.
Perhaps Toyota's on-board computers have hardware idiosyncrasies, too. Equally likely, changes in mechanical parameters resulting from value engineering were never communicated to the software engineers. What if a gas pedal that operates slowly due to mechanical problems causes exposes a bug in the control algorithm, causing it to fail under certain conditions? Or, Toyota's software engineers may simply have overlooked a rare, random and difficult to test condition. It wouldn't surprise me a bit if that were the case, and it shouldn't surprise the National Highway Transportation Safety Administration either.
Soon, all of the controls in our cars will be inputs to computers, including the steering. To ensure that the ongoing value engineering process and inevitable programming errors don't reduce safety, regulatory agencies should change the rules for certification. Instead of relying entirely on testing cars as mechanical subsystems and finished objects, NHTSA should require manufacturers to comply with ISO 90003 procedures for verifying software quality in all real-time control systems. The car manufacturers' means and methods as well as test results would become visible to experts who could comment before a new system was released for production. Although no quality assurance method can guarantee failure-proof software and value engineering can go astray, there's a good chance fail-safe systems can be achieved.