Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
A Modest Proposal for Code Restructuring

By czth in Op-Ed
Sun Nov 11, 2001 at 09:32:32 AM EST
Tags: Software (all tags)
Software

There comes a time in the life cycle of a program, that, due perhaps to no fault of the designers, who, under constraints of time and scope, have allowed the sources to become hopelessly entangled and ugly, that it is no longer worth maintaining. In fact, maintenance becomes a burden, as the program silts up most horrendously, and only one person, or a small handful of in-house experts know how the program works.

It's time to take the knowledge gained from building the first system and start over; it's time to rebuild, and let the new system arise, phoenix-like, from the ashes.


To be or not to be, that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And, by opposing, end them.

   -- Hamlet, Act 3, scene i[0].

It is my contention that a particular project where I work has reached this point.[1] It has taken a team of four people over a week to come to understand the flow of data through the program, and even as yet we do not fully understand, nor is it at all easy to answer questions we have, such as those on the life cycle of objects (in this case, because there are many unrestrained global variables and functions), necessary for our `port' to a three-tier system (for the curious, we want to go from from client-database to client-CORBA objects-database to allow object sharing and update notification).

We have presently only considered the inductive approach, that is, going from the specific to the general. We determine how the system works, and its abstractions, and how to change the existing system to make it work as desired. We consider which is the best way to make this change so that effort is minimal and chance of introducing new errors is least, while still delivering the required solution with some modicum of efficiency.

The problem with this is that it is still a band-aid; we merely add another impenetrable layer of cruft to layers that need not to be added to, but rather rebuilt.

[0] yes, I know the original context, but it fits here too
[1] see below under Justification for Rebuild


A Deductive Approach

Behold, I make all things new.
   --Revelation 21:5

Certainly, if one is to begin again, one cannot begin from the very beginning and discard the many man-years of work that have gone into a project; nothing this dramatic is suggested. Instead, we wish to salvage the most possible.

The deductive solution considers the particular needs of the client (`client' here meaning the users of the interface); in this case, our clients are two: graphical user interface objects, and a calculation engine. It is thus that objects are created: determine what interfaces are needed and build them. In so doing, remove the current depenencies on global variables and refactor as much as possible.

For this project, as an example, we know that we need a class to represent a river (let us call it river_c, following the existing convention) within a hydrological system; in fact, we need a whole list of rivers, because often a river is queried by its ID. From that, we also know that we need to be able to an ID-based lookup, which dictates the presence of a class with properties like the Standard Template Library (STL) std::map for such lookups.

The existing code that (somewhere, somehow) loads the current version of river_c must be pulled out, keeping the raw database access classes unchanged, and its functionality duplicated, rewriting where necessary, merely moving code otherwise. In our middle tier (CORBA server) initialization, realize we need to initialize (for example) a std::map<int,river_c> mapping river IDs to actual river objects. Furthermore, also based on usage--the specific in our deductive logic--we see that we need to expose a function to retrieve a river_c& by ID, and, this river_c object needs to have the properties ID and name. Note that we don't base these decisions on the database table structure, although of course one must be able to derive an (improper) subset of our object and its attributes from the database or we cannot create the objects, but rather on client usage requirements. Next, we would recurse into the requirements of the classes contained by river_c, such as a list of (power) plants, and from there a list of generating units, etc.

river_c should contain its own code to create itself from a row in the raw database table object (e.g. a via constructor), and a static method to create the aforementioned map; currently this code is split among a variety of unrelated global functions. In many cases, multiple tables and conversions are required, and naturally this code would be brought in as necessary.

This constructive approach has the advantage of eliminating any excess baggage, in that there will never be any reason to write any more code than will be used.

It is already acknowledged by the management here that this project must be rearchitected--the only question is the `when.' The amount of time (short-term) to add the third tier as a `bag on the side' should be close to the time required to rearchitect as we go (since with a better structure we can save a lot of time when writing the new code), and the gains in clarity improve future maintenance in manifold ways for the long-term.

Any radical (to management) approach such as this is naturally viewed with suspiscion and even incredulity at first. But based on the current state of the program, the desired changes, and the fact that this method is feasible to yield the desired result within the same or better time as the alternates, it is clear that it is the best of all possible choices. As Brooks[2] said, "Plan to throw one away; you will anyhow." The time is now.

Chemical engineers learned long ago that a process that works in the laboratory cannot be implemented in a factory in one step. An intermediate step called the pilot plant is necessary.... In most [software] projects, the first system is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved.... Delivering the throwaway to customers buys time, but it does so only at the cost of agony for the user, distraction for the builders while they do the redesign, and a bad reputation for the product that the best redesign will find hard to live down. Hence, plan to throw one away; you will, anyhow.

[2] Brooks, Frederick P. Jr. The Mythical Man-Month: Essays on Software Engineering. Reading, Mass. : Addison-Wesley, 1975 (revised 1995). ISBN 0-201-83595-9. Should be required reading for anyone going anywhere near code.


Justification for Rebuild

Some of this is specific to the my company's project, but it is probably true for many more as well.

This project (name withheld to protect the guilty) is a clear example of code that has succumbed to what Brooks ([2] above) calls the "Second-System effect."

Global variables and functions make it very difficult to trace the flow of control, especially when these variables are used as default parameters to constructors (for example), making tracking down their usage--as they are aliased; a veritable nightmare. In many cases, merely creating an object automatically adds it to a global list, but this is not at all obvious since this addition is through a default parameter to the constructor and the adding is done in the base class.

The code is verbose in the extreme, because often it is copied pedantically over and over, e.g. for error checking, when it would have been better to encapsulate these checks in (then) a macro which could return a string desciption of the error or (now) an inline function that can throw an exception if an error is encountered. There are often ten or so (large) classes that are exact copies of each other with literally four lines that differ. Refactoring is a necessity.

Much of the data is loaded multiple times in different ways. A deductive "from the ground up" loading of needed data would eliminate a lot of redundancy.

We depend on many libraries that are no longer needed (e.g. replace any use of MFC strings or containers with STL) or are not supported any more (either because the supporting company has gone out of business or just discontinued the library in question) and are unintuitive (e.g. using operator() to advance a list iterator). The dependencies on these libraries could be removed; the STL is (a) widely taught, (b) well documented, and (c) standardized as part of any conforming[3] C++ implementation.

[3] International Standard ISO-IEC 14882-1998 etc., Programming Languages--C++.


Conclusion

IT managers won't like it, but often it makes far more sense to scrap a bad design than to attempt to maintain it and hope it keeps working. The time it takes for someone new to the project to get `up to speed' on a project with a million lines of code (excluding libraries) is high enough without the project being a maze of brain-damaged cruft.

How do projects get this way? In part, I've tried to keep the blame to the many years (over ten in the case of the project I was talking about) that the application has been developed, or to the time constraints that make a better fix for various problems impossible. But frankly, many of the problems also arise from pure and simple badly written code, because of the ignorance and cluelessness of the writers thereof. Many of them in my project's case were engineers seconded into the role of programmers (the term `engineer code' isn't spoken with fear and loathing for no reason). Some code was written by co-op students and never audited. Some of the mess is, again, due to sheer ignorance.

Writing solid code is not an easy task. It comes from a combination of a sound theoretical background and experience reading and writing code. And not everyone can do it; with luck, those people that can't will be fired (or promoted to management, or have an unfortunate run-in with a Mack truck), before they can do too much damage.

How can we stop bad code from getting into projects, commercial or not? One supposes that open source projects are free from bad code because many eyes can see it, but this is only the case if many eyes do see it and it becomes taboo to add such, and someone can and does say "No, this code is horrible andisn't going into this project", a phenomenon more common in larger projects (KDE, the Linux kernel, Mozilla, etc.) than the morass of smaller endeavours. As I see it, the solution is the same for commercial or open projects: code must be reviewed, changes must be tracked (source code control--which we don't have here, even now, although I've been pushing for it--is an absolute necessity). The tenets of Extreme Programming (XP) and other such methods go even further, to pair programming and sequential integration. To a software company, though, pair programming appears to divide their employee base in two without tangible gain, so having code reviews is a good compromise that managers should find acceptable. As to the type of reviews, this can vary: two or three people (one the programmer) congregate around a machine to talk about the code after being sent the changed files to look at individually first, perhaps, or a more formal `panel' sitting around a table looking at printouts. Whatever works for you.

Programming may be art and may be science and is often both, but the quality of the code we write is a reflection on us and our profession, just as the design, robustness, lifespan, and ability to handle load of bridge reflects on the civil engineers and crew that build it. Let us take pride in our work.

(Plug for future article: should software developers need certification,like the P.Eng. certification that engineers usually require--the P.Dev. designation? Now, any idiot churned out by DeVry or ITT Tech [one year college programs] can call himself a programmer, and companies aren't all that adept at differentiating.)

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Related Links
o Also by czth


Display: Sort:
A Modest Proposal for Code Restructuring | 41 comments (32 topical, 9 editorial, 0 hidden)
A Really Gross Example (1.00 / 1) (#4)
by Gutza on Fri Nov 09, 2001 at 10:33:36 PM EST

You're talking about the facts of life - we sometimes need to ditch old code just to write new, more elegant code. How about new code which needs to be changed? I'm a web designer (I'm also an engineer, thank you for the nice words :) ) and I found this in one of my non-engineers' PHP code: they connected to a database to get the list of possible years in a web application. So this dude created a table in a database with these entries: "2001", "2002", "2003", "2004", "2005" and "2006". He then retreieved the data from that table to build a select box for deadlines. I went mad when I saw this - and I'm not his boss... So after 2006 you had to debug his code in order to add more years to your project. Now, how about that?

Who's your vendor, who's your vendor? — Scott Adams
time is K5
Why not just add another year to the table? (none / 0) (#6)
by kaemaril on Fri Nov 09, 2001 at 11:33:51 PM EST

Says it all, really.

Why, yes, I am being sarcastic. Why do you ask?


[ Parent ]
Because that's obviously wrong (none / 0) (#18)
by Gutza on Sat Nov 10, 2001 at 06:45:06 AM EST

I mean, doesn't it make more sense to retrieve the current year via a date function and count to that plus five instead?

Who's your vendor, who's your vendor? — Scott Adams
time is K5
[ Parent ]
Calendar Tables (4.00 / 1) (#22)
by SPrintF on Sat Nov 10, 2001 at 11:22:01 AM EST

Just using the formula CURRENT_YEAR + 5 creates the problem that you now have a "magic number" (+5) hard-coded into the program. For what you're doing, that may be OK. But consider the possible problems with this:

The most obvious is that, should the "magic number" ever change, someone is going to have to go in and modify the code with the new interval value.

Less obvious is that, there are times that you will want to run the program for years prior to the "magic number" change. In this case, you need to either keep two versions of the program (before the change and after) and hope you remember to run the right one, or you need to add some code like:

if (CURRENT_YEAR >= MAGIC_REVISION_YEAR_1) return CURRENT_MAGIC_NUMBER else return OLD_MAGIC_NUMBER

With either method, you are now effectively maintaining a calendar "table," either by keeping a date-spanned list of program releases, or by embedding the table in the code.

Again, for what you're doing, CURRENT_YEAR + 5 may be sufficient. But it's always worthwhile to remember that "all constants are variables," and to balance simplicity against possible maintainance problems 2 to 5 years out.

[ Parent ]

No. (none / 0) (#23)
by kaemaril on Sat Nov 10, 2001 at 11:58:22 AM EST

Sprintf beat me to it for why... :)

Why, yes, I am being sarcastic. Why do you ask?


[ Parent ]
Maybe he did it on purpose. (4.00 / 1) (#21)
by DavidTC on Sat Nov 10, 2001 at 11:05:22 AM EST

Maybe he doesn't want people using the same code after 2006? Or at least having to simply look at it in 2007 to fidn out what's wrong.

Leaving code unmodified because it was running for 25+ years was what caused the Y2K panic. I don't think any code should hang around for more than five years withoug being gone over.

-David T. C.
Yes, my email address is real.
[ Parent ]

Reviewing running code after 5 years (none / 0) (#38)
by miller on Mon Nov 12, 2001 at 02:48:00 PM EST

I should think that if code's been running for 5 years in a production environment, it's a pretty good bet it'll continue to do so for another 5 years. The only variable that's required to change here is the date, so a periodic review is only of use to capture date related bugs (like the y2k issue).

What caused the y2k issue was not that the code wasn't reviewed every five years, but that the code design was rarely vetted before being coded, anyone who understood the design no longer worked for the company and most significantly, the design was never documented anywhere so that new maintainers could get up to speed quickly.

Personally I think that if code is operating in a manner fit for purpose, don't try to fix it. Spend your time fixing the things that are broken. I don't care if your algorithm is nasty so long as it's not too slow for anyone and the larger design is sound. A bogus algorithm can be rewritten in a matter of hours if the design specifies it properly, but a bogus design takes much longer to rewrite and reimpiment.

--
It's too bad I don't take drugs, I think it would be even better. -- Lagged2Death
[ Parent ]

I didn't mean 'rewrite'. (none / 0) (#39)
by DavidTC on Mon Nov 12, 2001 at 05:14:28 PM EST

I meant look at. It's one thing to leave code workng, but you should know it's there, know how it's working, what it does, etc.

Given the rapid turnover in the IT field, I think 5 years is probably a long time to take between going back over the code. Everyone's forgotten about everything by then.

-David T. C.
Yes, my email address is real.
[ Parent ]

Neither did I (none / 0) (#40)
by miller on Tue Nov 13, 2001 at 12:15:34 PM EST

I'll grant that my last paragraph went off at something of a tangent, but the rest of my reply was referring to code reviewing.

If design specs existed in the first place then that's all you need to look at to get up to speed from any starting point. You don't need to get your hands messy with the actual guts (the code) until it falls over.

When I find legacy code that I need to modify I'll write notes on it that I can publish alongside the source. It's not a full design, but it'll help the next guy get up to speed a little quicker without having to understand all the code that I had to. Actually going over the code again every few years from scratch is the wrong way to go about it IMO, and the good thing about design docs is that they can be digested reactively, not proactively.

--
It's too bad I don't take drugs, I think it would be even better. -- Lagged2Death
[ Parent ]

Heh... (none / 0) (#41)
by DavidTC on Thu Nov 15, 2001 at 07:03:01 PM EST

You're one those wackos with 'design specs'. ;)

Yeah, in theory, if the design is well...designed, you shouldn't have to dip into the actual code, and be able to treat it as a black box.

But while that may work in theory, it may not work that well in practice. ;)

And, of course, that assumes you don't touch the box at all. You never know if something is accessing some far out directory or file, so sometimes 'getting rid of all this junk' can touch a perfectly fine program that's been running silently for years until you deleted its data files out from under it.

And that is why, in practice, you should at least glance at the actual code every blue moon. ;)

-David T. C.
Yes, my email address is real.
[ Parent ]

Dealing with cruft (3.00 / 1) (#8)
by wiml on Sat Nov 10, 2001 at 12:18:37 AM EST

Another good book that talks about the evolution and design of software systems is The UNIX Philosophy by Mike Gancarz.

The proposal here doesn't convince me that a total rewrite is better than an incremental cleanup. The problems you cite can usually be fixed without tossing the existing code. Dependence on multiple, redundant libraries? Search for all references to the libraries you want to get rid of, and rewrite those parts of the code. Global variables? Sometimes requires rearchitecting, but can often be dealt with just by moving the variables into an appropriate object which is accessible from all the places the global was used. Copy-and-paste coding? When you find any instances, combine them. Whether the appropriate way to clean up this code is a rewrite (glamorous) or many incremental changes (boring scutwork) isn't really clear from the points presented in the article. It could be either.

Either way, however, you need to convince management that this effort deserves some of the company's programmer-time, and reading between the lines I think this is the real root of your problem: your management, and perhaps the general company culture, doesn't believe in code maintenance. You need to convince them that maintenance is a necessary and profitable use of time --- not just when a program has reached a crisis point, as this one apparently has, but also as an ongoing activity. If you live in a house, periodically you need to move all the furniture, pick up the rugs, and clean them and the floor. During this time, that room isn't very usable. But that isn't an indication that there's something wrong with the house. It's just part of keeping the house in order. Likewise, as you work on a piece of code, you will sometimes notice an unneccessary global variable, or a handful of routines all incompletely solving one problem in different ways. Unless there's a deadline or something, you should be free to fix it right then. You shouldn't have to worry about your boss, or your coworker, thinking that you've wasted time doing something today that you could have put off until next quarter. Unless you're planning to go out of business soon, you'll have to fix it eventually --- and the sooner you fix it, the less time you'll waste dealing with cruft in the meantime. If your boss can't understand this, I recommend that you get a new boss...

Been There Done That (5.00 / 2) (#12)
by bgalehouse on Sat Nov 10, 2001 at 12:48:42 AM EST

It is very simple, what you need to do. Find a section of the system, find a section with a defined purpose and boundaries. Originaly (even if only at the design stage) the boundaries existed and were clean. Find them as best as you can.

Then choose an area, and rewrite it from scratch. Reference the original specs more than the original code. You might need to write patch code to communicate with the rest of the system. If, for example, they have a truely awe inspiring bad approach to error handling, you might have to go through special tricks to convert their error into something more sane, and vice versa. This is fine, but keep the code separate, Don't intermingle it with the buisiness logic.

Rinse and repeat.

The hard part isn't finding the cleaner design for an area. The hard part isn't even writing the patch code. The hard part is spotting the boundaries.

Invoking Joel Spolsky (3.66 / 3) (#13)
by HereticMessiah on Sat Nov 10, 2001 at 01:29:09 AM EST

I have to really. Have you read what he's written on such topics on his website? According to him, `Completely rewriting code is a big-time mistake common of immature developers with no real software experience'.

--
Disagree with me? Post a reply.
Think my post's poor or trolling? Rate me down.
Rewriting code mistake? (5.00 / 1) (#14)
by sigwinch on Sat Nov 10, 2001 at 02:57:25 AM EST

Rewriting code with terminal cut-n-paste-itis is never a mistake.

Sure, reading code is more difficult than writing it, but it is entirely possible for code to need a complete rewrite. E.g., when the team spends months studying < 50 kloc and *still* can't understand it (happened on a project I witnessed, and no, they didn't rewrite it, and yes, it was a continuing unmitigated disaster).

--
I don't want the world, I just want your half.
[ Parent ]

at times it makes sense... (none / 0) (#15)
by rebelcool on Sat Nov 10, 2001 at 03:01:10 AM EST

sometimes a platform passes a piece of software by enough that while its still useful, it needs a rewrite to work properly.

Case in point, my own COG seems to take issue with IE 6.0's handling of cookies. The default install for IE 6.0 makes it simply not work for some reason (they tinkered with security a bit) on some site. Solution? A badly needed rewrite is in order anyway to take advantage of more modern concepts such as sessions to tackle that problem...

COG. Build your own community. Free, easy, powerful. Demo site
[ Parent ]

Well, Sort Of. (4.50 / 2) (#24)
by zephiros on Sun Nov 11, 2001 at 12:46:47 AM EST

Most of Joel's diatribe seems to be directed at the idea of hurling the entire contents of the old source into the dustbin, then saying "so, what do we want the application to do again?"

Which is common sense. Even if the old code is ugly and nasty and crufty, it's still the best documentation for how the systems works now. It would be absurd to ashcan it outright. That said, it is quite possible (and sometimes desirable) to make significant changes to the architecture of existing code. Refusing, on principle, to revisit and rethink badly aging systems is a big-time mistake common of experienced developers who have cultivated an excessively risk-averse mindset.
 
Kuro5hin is full of mostly freaks and hostile lunatics - KTB
[ Parent ]

Not to mention Internet time (4.00 / 1) (#16)
by I am Jack's username on Sat Nov 10, 2001 at 04:32:40 AM EST

"If a system is of sufficient complexity, it will be written before it's designed, implemented before it's tested and obsolete before it's debugged." - unknown. A pretty good description of the projects I work on :).
--
Inoshiro for president!
"War does not determine who is right - only who is left." - Bertrand Russell
The Big Ball of Mud (5.00 / 2) (#17)
by jesterzog on Sat Nov 10, 2001 at 05:47:29 AM EST

Have you read Big Ball of Mud, by Brian Foote and Joseph Yoder?

It's an excellent paper that looks at the patterns that emerge in software development, especially in a corporate environment. The main patterns that it looks at are:

  • Big ball of mud - Speaks for itself. It's aliased as shanty towns and spaghetti code.
  • Throwaway code - Designed for temporary use, but nobody gets around to replacing it.
  • Piecemeal growth - Application grows bit by bit as new functionality is added to it.
  • Keep it working - keeping it going is given priority over making it more reliable in general.
  • Shearing layers - Code that changes at the same rate should be kept together (but often isn't).
  • Sweeping under the rug - Temporarily putting a wrapper around disgusting code so it can be worried about later.
  • Reconstruction - Throwing it out and starting again.

Of course, Foote and Yoder manage to describe these patterns over about 40 printed pages, and it's an interesting read. I'd reccommend reading this to any software developer whether you consider yourself a good developer or completely hopeless, even if there's only time to flip through it.


jesterzog Fight the light


You're forgetting to address one thing (none / 0) (#20)
by finial on Sat Nov 10, 2001 at 10:53:10 AM EST

What do you do between the time you decide to start a new effort and when that new effort is finished? You can not ignore or "orphan" a system that is in the field and being used during the year (or however long) it takes to get the new one out the door. People are using the old system and it, apparently, has enough flaws that it will continue to require maintenance during the development period. This quite large effort needs to be accounted for in time, money and personnel.



[OT] STL documentation? (none / 0) (#25)
by _Quinn on Sun Nov 11, 2001 at 03:14:00 AM EST

It's been a while since I looked at the STL, because the last time I did, I couldn't find any good docs for it, and the header files were impenetrable messes. It really sounds like a nice idea, so if anyone good point out good webreferences, especially 'getting started'-type references, I'd appreciate it.

-_Quinn
Reality Maintenance Group, Silver City Construction Co., Ltd.
STL docs (5.00 / 1) (#26)
by PresJPolk on Sun Nov 11, 2001 at 10:35:57 AM EST

http://www.sgi.com/tech/stl/

[ Parent ]
read the standard (none / 0) (#28)
by pfaffben on Sun Nov 11, 2001 at 01:15:38 PM EST

Buy a copy of the standard for $18 at webstore. It is pretty readable.

[ Parent ]
docs (none / 0) (#32)
by kataklyst on Sun Nov 11, 2001 at 10:15:18 PM EST

I looked around and found a free online C++ book that includes chapters on the stl. It should be alot easier to get into than the references already mentioned on this thread.

If you have trouble compiling their examples, you may need to add the following line after the includes:
using namespace std;

[ Parent ]

STL Books (5.00 / 1) (#35)
by avdi on Mon Nov 12, 2001 at 01:05:45 PM EST

I don't know about web references; but there are some excellent printed resources.

First of all, Stroustrup makes extensive use of the STL in the third edition of The C++ Programming Language. He takes the view (rightly) that as the STL is now an integral part of C++, it should be used wherever a homebrew data structure or algorithm would otherwise have been used.

For a more comprehensive guide, try The C++ Standard Library, by Nikolai Josuttis. It's one of the few books I've read that works well as both a tutorial and a reference. It's complete, well researched, well written, and it has a permenent place on the shelf next to my workstation. Other books I've seen recommended are Plauger's The C++ Standard Template Library, and Scott Meyer's Effective STL.

--
Now leave us, and take your fish with you. - Faramir
[ Parent ]

Re: STL Books (none / 0) (#36)
by RocketJeff on Mon Nov 12, 2001 at 02:09:20 PM EST

Your book suggestions are the ones I would have made (not that that means anything...).

My boss read the book by Josuttis and immediately bought three for our group. I just wish I could find one of them - other groups in the company have been 'borrowing' them.

As I get more experiences at STL, I keep looking through Meyer's Effective STL. Although a newcomer can learn things from this book, it's more useful to developers with some STL experience (like his Effective C++ books). I really like the fact he keeps the eratta list for his books online.

[ Parent ]

Matt Austern's STL reference. (none / 0) (#37)
by your_desired_username on Mon Nov 12, 2001 at 02:14:15 PM EST

If you are a beginner, check out Andrew Koenig's Accelerated C++, and start out right.

[ Parent ]
Response to comments (5.00 / 1) (#27)
by czth on Sun Nov 11, 2001 at 01:06:22 PM EST

Thanks for all the comments, constructive criticism, and stories. I almost wish I could resubmit with the suggested changes, it would indeed make for a better article. Now that the article is posted, I'll respond to the comments (I don't know what policy is on posting comments to an article while it's in the queue, but I decided to refrain until it was decided).

[jesterzog] re: Big Ball of Mud (http://www.laputan.org/mud/mud.html); thanks for pointer, interesting article and good parellel to (satire of?) DP. "Reconstruction" section especially appropriate, I'll include it if I rewrite.

[HereticMessiah] points out a completely opposite view. The important thing in a rewrite, as my article pointed out, however, is to salvage what you can and write the new code (a) as clean interfaces, (b) learning from the old. But the linked article makes good points: bugs have been tested etc. We don't want to lose that. But that isn't the problem, because the old design really is as bad as I make it out to be. And since I'm not really throwing away everything, you could argue that my plan is somewhat the "incremental redesign" suggested by [wiml] and [bgalehouse]. If it becomes necessary to resubmit this article, I would dwell more on duplicating existing functionality while refactoring, removing globals, cleaning up interfaces, etc. It would be insane to start with a tabla rasa and hope to magick up a similarly functional program out of the void; I don't propose that at all.

More detail on the example system (also [bgalehouse]). Thanks for the suggestion, it'll be there, perhaps a separate section, in the rewrite.

[wiml] re: getting a new boss. Working on it, my contract here expires at the end of the year, after that I may be hired FT, maybe not.

[wiml] again, re: article reads more like a proposal to management. Good eye, man, that's how it started out :). It actually got to my boss, too, although it wasn't supposed to, but he's a good chap, although still bound by the constraints of needing to get tangible work done and not refactoring code.

[kwsNI] The essay "A Modest Proposal" is a famous satire by Jonathan Swift wherein he proposes that poor Irish babies be used for food. My reason for including it in my title is (in part) because my suggestion is similarly dramatic to managers and to the many programmers that regard "their" code as their "baby" and cling to it, not wanting to see it destroyed or even changed.... Or maybe I just included it because I liked the sound of it (or as [atreides] said, to announce that "I read in high school" :).

Good point regarding Justification for Rebuild not starting with a quote, I should have maybe written in ()s, but still, since the italicized sentence isn't attributed, I think it's clear enough that I wrote it. OTOH, consistency is a good thing.

I will add reference links, maybe in the rewrite bringing in some given by comments here. But in particular, the ISO C++ standard is not available freely, although I could link to the page where you can buy a copy....

[finial] mentions maintaining and supporting the previous release(s). Certainly a valid point: there are many users of our system and there would indeed be a transition period. So have the developers allocate some of their time to the old version, of course fixing any bugs in both "branches" (of the CVS tree, one hopes). Business as usual except that the restructuring is done on a branched version of the source.

[forgotten gentleman] says "This article does not argue something controversial; there are very few large systems that did not undergo complete rewrites." Sure, you know that, and I know that, and most coders do, but management doesn't, and that was the original focus of the essay, although I did change it fairly significantly to present it here.

Integration strategies are not intended to be the focus of my discussion, although they could be; I could hone in on our project, and look at other major system rewrites as case studies, if I wanted to. And e.g. the necessity to avoid "second system effect" but rather _remove_ extraneous debris.

To [_Quinn]: The Dinkum C++ library reference is downloadable and also available online here, as well as the SGI reference already mentioned in another comment; there are also numerous books (e.g. Deitel and Deitel, C++ How To Program, and of course the standard referenced in the article if you want all the sordid details).

czth (4031)

refactoring (none / 0) (#29)
by kubalaa on Sun Nov 11, 2001 at 02:03:12 PM EST

Judging by the way you seem to consistently misuse the term refactoring, I think a clarification is in order. It is essential to the nature of refactoring that changes be small, incremental, and semantic-preserving. Talking of refactoring in the context of a rewrite is like talking of remodeling a house you're about to tear down.

[ Parent ]
Refactoring and rewriting (none / 0) (#31)
by czth on Sun Nov 11, 2001 at 04:03:36 PM EST

I see refactoring as removing duplicate code, by either moving it to a separate function or a common class.

Perhaps the reason why I seem to use it with rewriting is that much of the code in my "case study" project consists of sets of (usually around 20) classes, about 1000 lines long (maybe more, going from memory), which literally differ in only four lines (and those lines are of the same form, just e.g. different text or invoking a different class to obtain data, but all related).

So, here refactoring constitutes a fairly major change throughout (wrt loc, at least, but also in the class structure). It is still gradual in that it does not change behaviour nor interfaces for these classes.

OTOH, rewriting is of course bigger, because it actually changes how things work and possibly interfaces too. Since - as has been pointed out by many - changes must be gradual (a working version must exist at all times), I see refactoring, at least in this case, as a first step and then rewriting (because in this case too much raw data access code is in GUI objects) as a second.

[ Parent ]

re: refactoring (none / 0) (#33)
by kubalaa on Mon Nov 12, 2001 at 08:51:32 AM EST

Refactoring is any change which preserves semantics. This change can be of any size (although it's hard to prove a large change is semantics-preserving without breaking it into smaller ones first), and it can include changing ``how things work and possibly interfaces too.''

That's because ``semantics preserving'' means with respect to the behaviour of the black box within which we are refactoring. Take a simple example: within a function we change the name of a lexically-scoped local variable. This is not semantics-preserving within the function, because it is easy to invent a line that would work with the old function but not with the new (it would just use the old variable name). However, it is semantics-preserving outside the function because any calling code that worked with the old function will work with the new.

I explain that really obvious case to extend it to the activity you said is not part of refactoring: namely changing interfaces. The only thing that's different is that our ``black box'' of semantics-preservation now extends to all users of that interface. Indeed, ANY semantics-preserving change you make to code can be seen as changing an interface and visa versa: in the first example, the ``interface'' was the get-reference invoked implicitly by the object name, and the user of the interface was the function.

Hopefully that's all clear now, and we can see that there are two kinds of changes: those that don't affect the end-user and those that do, that all architectural changes fall into the first category, and furthermore that all architectural changes can be achieved by refactoring. I don't know of any formal definition for ``rewriting'', and can't think of a particularly good one. Perhaps you mean ``any change which ought to preserve semantics but because it's so large and untested it very well may not.'' Or maybe ``refactoring and adding new features at the same time.''

The latter seems to work; we often refactor in order to add new features. But the question is whether we gain anything by doing both at the same time, as you seem to be advocating. The idea behind refactoring is to reduce changes to the smallest possible units so that you always know exactly what change introduced the newest bug.

As an aside: we can place adding new features in the context of refactoring as well, except in this case the semantics-preserving box is the universe, and the interface to be changed is the contract between your program and the universe. So we alter the contract, alter the program to adhere to the new contract, and test the universe to be sure semantics were preserved. Semantics being preserved in this case means the logical rules of the universe still function normally under the assumption that the new contract accurately reflects the relationship between universe and program. If your program has a bug, that means that there is a logical inconsistency in the universe: either your program is true or the contract is true but not both. :)

Anyways, the point of that philosophical diversion is that adding new features can be done in the same spirit as refactoring, by making small changes and testing. So back to explaining why we shouldn't mix the two:

If you combine a refactoring-step and a feature-step together, you now have twice as big a change unit with the possibility of bugs created not just by external inconsistencies but internal ones as well: in other words, something like the number of bugs squared, and harder to find. :) Since, as I explained, adding new features is refactoring the universe, we can make the extension that this is no different than applying two refactorings simultaneously within your program, say, changing a variable name within a function and changing the interface to that function. Sure, you can do it. But it's dangerous.

Since you can achieve the same end-result by interleaving refactoring and feature-adding (aka doing refactorings in sequence rather than simultanously), I'm dubious that ``rewriting'' has any advantages other than it's less tedious and more fun.

[ Parent ]

rewrite vs refactoring (none / 0) (#30)
by kubalaa on Sun Nov 11, 2001 at 02:22:49 PM EST

One more thing. :)

You concede the point that throwing out everything is a bad idea, but you're obviously talking about something beyond refactoring. Can you describe exactly what the difference is, and why it's beneficial?

(Aside: is there any reason to use the phrase `tabla rasa' instead of `clean slate' other than to sound more intelligent? This problem seems to pervade your writing style.)

[ Parent ]

How to convince management (3.00 / 1) (#34)
by edwin on Mon Nov 12, 2001 at 12:35:10 PM EST

I shan't comment on the merits of your proposal, but here's a suggestion to make it sound more modest, and hence more likely to be accepted. It's a general principle called the "three out of four rule".

You need to prepare a presentation outlining four different proposals. The first two should be very minor changes. The third one is your current suggestion. Finally, you put at the end a really wild proposal, like rewriting the whole thing in INTERCAL or something. This immediately makes your proposal sound more conservative and acceptable. Try it - it really works!

A Modest Proposal for Code Restructuring | 41 comments (32 topical, 9 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!