The Myth of the Software Rewrite
Editorial note: this post was originally written for the NDepend blog, and you can read the original here. If you like the topics of static analysis and code metrics, there’s a lot you’ll love over there.
Editorial update: due to the unanticipated popularity of this post and the fact that I’m buried in work the first half of this week, I’m planning to write a detailed follow-up addressing some of the sentiments in the comments that I’m seeing. That will be on the NDepend blog, where the original appeared. Stay tuned, if you’re interested in follow up, and thanks for reading/commenting!
“We can’t go on like this. We need to rewrite this thing from scratch.”
The Writing is on the Wall
These words infuriate CIOs and terrify managers and directors of software engineering. They’re uttered haltingly, reluctantly, by architects and team leads. The developers working on the projects on a day to day basis, however, often make these statements emphatically and heatedly.
All of these positions are understandable. The CIO views a standing code base as an asset with sunk cost, much the way that you’d view a car that you’ve paid off. It’s not pretty, but it gets the job done. So you don’t want to hear a mechanic telling you that it’s a total and that you need to spend a lot of money on a new one. Managers reporting to these CIOs are afraid of being that mechanic and delivering the bad news.
Those are folks whose lives are meetings, power points, and spreadsheets, though. If you’re a developer, you live the day to day reality of your code base. And, to soldier on with the metaphor a bit, it’s pretty awful if your day to day reality is driving around a clunker that leaves car parts on the road after every pothole. You don’t just start to daydream about how nice it would be to ride around in a reliable, new car. You become justifiably convinced that doing anything less is a hazard to your well being.
And so it comes to pass that hordes of developers storm the castle with torches and pitchforks, demanding a rewrite. What do we want? A rewrite! When do we want it? Now!
At first, management tries to ignore them, but after a while that’s not possible. The next step is usually bribery — bringing in ping pong tables or having a bunch of morale-building company lunches. If the carrot doesn’t work, sometimes the stick is trotted out and developers are ordered to stop complaining about the code. But, sooner or later, as milestones slip further and further and the defect count starts to mount, management gives in. If the problem doesn’t go away on its own, and neither carrots nor sticks seem to work, there’s no choice, right? And, after all, aren’t you just trusting the experts and shouldn’t you, maybe, have been doing that all along?
There’s just one nagging problem. Is there any reason to think the rewrite will turn out better than the current system?
Would the Rewrite Go Well?
Let’s do a dispassionate play by play of the situation. A software group starts writing a piece of software and they’re productive at it. Over the course of time, as they hustle to get features out the door, they make a mess, always vowing to clean it up later, when they have the time. But, they never have the time because with every delivery cycle, they’re able to ship fewer things because of all the problems that have developed in the code. Eventually, features slow to a crawl, developers are increasingly miserable, and the group suffers attrition as people start heading to other groups or companies for greener (field) pastures. Things are in a downward spiral and something must be done. The developers want that something to be a total rewrite. “This time,” they say, “we know so many things we didn’t know when we started the current system, so this time we’ll get it right.”
Insanity: doing the same thing over and over again and expecting different results.
— Albert Einstein
Sure, they know things now that they didn’t know when they started on this code 3 years ago. But won’t the same thing be true in 3 years? Won’t the developers then be looking at the code and saying, “this is mess — if only we knew in 2015 what we now know in 2018!” And, beyond that, what makes you think that giving the same group of people the same marching orders won’t result in the same kind of code?
The “big rewrite from scratch because this is a mess” is a losing strategy.
Don’t get me wrong. There are certainly times when old software needs to be phased out in favor of more modern stuff. If you have code specific to hardware that is no longer manufactured, you’re better served building new software than scrounging E-Bay for resellers of the old server model. If you have a line of business application written in a defunct language that no one knows anymore, you’ll probably need to bite the bullet and commission something modern. But you don’t need to rewrite software because developers made a mess of it while hurrying to meet deadlines.
It’s a long road back from a mess, but the road exists. You can use automated tooling to identify and start working to improve the most dangerous parts of the code. Automated tests are your friend — characterize the system’s current behavior with lots of automated tests and then work on refactoring. Bring in coaches or developers that are used to legacy rescues. Shift the team’s priorities and help the business understand that it’s time to pay the piper on the accumulated technical debt. They’re going to have to deal with a slowdown in the short term to go faster, sustainably over the long term. And they’re in no position to complain — that’s exactly what a rewrite would mean too. It’s just that this approach is an actual game changer and not just more of the same.
When everyone on the project is at wits’ end and people are finally past the shock of the notion of a write-off, the rewrite is tempting. It’s like making peace with having a car payment and starting to get excited about the newfangled dashboard computer and leather seats of the luxury thing you’re going to buy next. But software isn’t a car. The software is a mess because the group made it a mess, and it’ll only get and stay clean if the group cleans it.
“Automated tests are your friend — characterize the system’s current behavior with lots of automated tests and then work on refactoring.” This is great, but assumes that the existing codebase is amenable to implementing such tests in order to give confidence to that refactoring effort. Many legacy codebases have their code so deeply intertwined that adding such tests is incredibly difficult, if not close to impossible. Even in legacy codebases that have all of their “layers” of code so tightly coupled, there are a number of ways in which developers can attempt to implement such tests, as detailed by Michael… Read more »
“Very frequently, the original codebase is in such a poor state that the refactoring effort is just as large an undertaking as the complete re-write.”
Even if, pound for pound, the rewrite effort costs the same as the refactoring effort the legacy code base has the benefit of remaining feature complete while refactoring is done.
There’s no guarantee of it being feature-complete once a rewrite is done, because in a legacy codebase, bugs cease to be bugs, and are “features.” When you re-write, you have the leeway to remove bugs or explicitly codify the behaviour; when you’re simply re-factoring, you can’t eliminate bugs without expensive analysis. The “switch to more modern frameworks” point is also important. Just look at the healthcare sector in the US: 50 states, each with a different insurance system, each written in a different language and framework, none of which can communicate with each other. The problem? They were written between… Read more »
Unfortunately, legacy systems are seldom feature complete and often full of serious bugs. Work on the code base has been reduced to absolutely critical feature additions because everyone in management is scared it will fall apart if anyone touches it. By the time the old cryptic mess has been understood enough to refactor, it is as much work as a rewrite. If legacy systems were written with sound practice more often, this would not be an issue. But often they are written with every concern entangled using global variables instead of parameters. They are often method-oriented rather then object-oriented. They… Read more »
In the context of that last sentence, I think I’m mainly in agreement with your sentiment. Philosophically, if I went into each and every method in a codebase and cleaned it up, I would have more or less “rewritten” it, kind of the way a human is, on a cellular level, continuously “rewritten” while remaining the same person. My main contention in this line of reasoning in my post is to look skeptically at the developers effectively declaring bankruptcy under the weight of tech debt that they created. But if the developers look at the system and say, “we’ll start… Read more »
All well and good until you find out that every single module is so tightly coupled into every other module that it is impossible to rewrite one without refactoring them all!
I wonder if this discussion is complicated by the fact that some people think a codebase from 2013 is “legacy,” when most of us are thinking “30 year old codebase written in a language not updated for 15 years”
“But if the developers look at the system and say, “we’ll start by
isolating the worst module, X, and rewriting that,” I’m a lot more
inclined to think things can improve.”
But again, Erik, that heavily implies that Module X can indeed be isolated from other code. For many legacy code bases with global (or near global) state isolating *any* specific module or piece of functionality can be huge undertaking.
I’m planning to write a follow-up post to address some of the feedback in the comments here. I’ll speak more at length on this and a few other topics in that venue. But for now, I have some comparably brief thoughts. I definitely agree that it’s easier to characterize some legacy code bases than others. But I’d submit that sizing “the refactoring effort” is hard to pin down. Do you mean the effort it would take to take a nasty legacy mess and turn it into a paragon of clean, decoupled code? Or do you mean taking the legacy mess… Read more »
“Do you tell all existing users that you’re sun-setting the app for X months/years until the rewrite is finished? I’m just wondering what the market strategy looks like if you make this call.” You have a moratorium on new features/functionality within the existing product’s code base, reducing all development effort to critical bug fixes only, whilst the team performs the re-write. The team’s time is thus split such that a vast majority of their time (perhaps 90-95% of the working week) is dedicated to activities relating to the new rewrite whilst the other 5-10% of the time is the bug… Read more »
About five years ago I was suckered into working for a company who sold it as an opportunity to write cutting edge new software using the latest technologies. Only to be stuck for over a year maintaining a codebase written in VB6 that they were afraid to rewrite because of all these articles saying you shouldn’t rewrite software. But they weren’t too afraid to try to make the VB6 software do something fundamentally different from what it was designed to do. Regardless, the lesson I learned is if the company is unwilling to spend the time attacking technical debt or… Read more »
I agree and had a similar experience before I finally decided to quite for good working in corporate environments. At my last position all of the applications were in terrible condition as a result of a technical manager that was so incompetent one wonders how he maintained his position. I recommended that a number of applications be completely re-written, including the most critical, their pricing application, which was the core of a lot of their pricing. No one wanted to be bothered so when I left only the new applications I had developed kept humming along without any issues. As… Read more »
I’ve never understood why companies do that. The only thing I can assume is that it’s an aspirational commitment like someone who buys a gym membership, assuming that this commitment device will motivate them to exercise. I’ve heard it less in the context of old vs. new techs and more in the context of “oh, we’re totally agile and do unit tests and stuff. Totally.” So if this codebase was a VB6 one circa 2010, that means it has to have been around at least a decade. I’m curious as to what the app did, and why they never ported… Read more »
As someone that was in this situation (trying to do a vb6 to .net port) the true problem can arise when you try to take functionality out of the vb6 app. Most of these programs were so tightly cobbled together that pulling even small bits of functionality out really risked destroying multiple other sections of the code. Doubly so with legacy languages that are mildly hostile to object oriented practices and other attempts to keep global state corralled. Also (more in my experience) do you know what that code does? Is it doing what it looks like it’s doing. Because… Read more »
Ah, yes, there we go.
In 2015, currently working in a codebase from 2005, which was merely an IDE-upgrade for a codebase from the late 1980s. Now THAT’S legacy. But don’t worry, we’re dropping the product in
1 + EncodeDate(StrToInt(FormatDateTime(‘yyyy’,Now)),1,1);
Funnily enough, last week a post on the Signal vs Noise blog was a talk about rewriting your software (and why they rewrote basecamp 3 from scratch)
https://signalvnoise.com/posts/3959-rewrite-why-basecamp-3-is-a-brand-new-code
Would love to hear your opinion on that!
I watched about the first half of that video and found it interesting (I’d have kept watching, but I actually had to run). My take from this is that DHH is talking about a different situation than I am. He’s saying, “software isn’t infinitely malleable, and it doesn’t make sense to evolve your desktop based CRUD app into an Android video game” (exaggerating a bit for effect). That’s what I took from the “turn your table into a chair” metaphor. And I agree with him. What I’m saying, to use his terms, is “if you have a crew of people… Read more »
In my personal experience, a rewrite is usually easier and faster (===cheaper) than fixing a messed up code base, of course if the people doing it are the people that built or maintain it. You have for once fairly static specifications and certainly deeper understanding of what you need and where are the weaknesses of the system, so the rewrite is much quicker than the original. Following the car sample, there is a moment when maintaining it is just not worth the cost. It certainly depends on the codebase you have to work with, and the inflection point is what… Read more »
When you’re computing “cheaper” are the determining factors in this equation simply developer salary hours? Are you also factoring in the cost of spending a significant amount of time bringing nothing to market/delivering no capabilities? Does break/fix activities and patching still happen?
Let me clear up something: I consider refactoring is rewriting.
I’ve just bailed out of a blue chip that was porting plug-ins for an in-house mechanical engineering code to a COTS code. The existing code was in Fortran 77 there was insistence on re-writing in C++. My view was to re-use code and associated test-suites. Their view was to write specifications (badly). Eventually I succumbed to pressure (i.e. my machine didn’t have Fortran installed) and I ran my own Fortran to C++ translator on the code before refactoring.
My view is never do a full re-write as the code is the specification.
What was the motivation for the rewrite, out of pure curiosity?
It was felt that the average plugin, being written by a full time CAD jockey or stress analyst, was of low code quality. So much better to have it re-written by a software engineer with, typically, no domain experience as a clean sheet re-write.
I arrived on the project when it had stalled due to incorrect staffing. It was felt that the mechanical engineers could write full specifications, which proved to be wrong for even the simplest plug-in. Also the test data was re-created rather than translate the finite element meshes, etc.
As someone who has recently been working on a code base the 80’s (VBScript & Classic ASP) where the preferring architecture was somewhere between spaghetti and ball of mud I can say that on occasion a total rewrite is completely justified.
Now saying that most of the developers programmers engineers I have worked with never take into account what a rewrite will cost financially. Should we rewrite a product from the 80’s? Absolutely. What if doing so bankrupts the company? Most technical people never consider that position.
This is one of the things I’ll cover in more detail in the post I’m planning, but I think “rewrite something from the 80’s” might fall into a slightly different category than what I’m talking about. As an extreme example, let’s say that you have some old application that deploys to Blackberries and helps sales people in the field check in on clients by jacking into the corporate CRM. If a Blackberry is then pried from the cold, dead hands of the last exec to carry one, rendering the application defunct, I wouldn’t consider a port of the application to… Read more »
It’s not as simple as just fix it. I worked at a company that had invested in Java/J2EE and MySQL and decided to switch to Ruby on Rails and MongoDB. When the project started, these NoSQL databases simply didn’t exist, but the application really really needed that kind of flexibility. The end result wasn’t so much crappy spaghetti code, as poorly performing code doing things SQL just plain sucks at. You can store anything in SQL, but querying it is another matter entirely! That’s why these other databases were invented. In cases where it really is spaghetti code, a rewrite… Read more »
This is another thing that I’ll address later, in more detail, but there’s an interesting distinction here. In my original post, I was writing assuming a situation where developers were trying to drive a rewrite because of a mess they’d made. You’re talking about (I think) a situation where management has whip-cracked to meet deadlines, forcing the developers to take on tech debt and make bad tradeoffs. In this situation, I think a rewrite will go poorly as well, but I think in this situation that any approach will go poorly. Refusing to acknowledge the results of tradeoffs that have… Read more »
The problem isn’t so much the re-write itself (at least not always). Bad code is often a result of multiple factors. Yes there is bad design, but often bad design is a result of managers who sing the mantras of “YAGNI” and “premature optimization is the root of all evil” when what they really mean is “get ‘er done – I want my bonus”. To be fair these mantras aren’t incorrect, just incorrectly used by many. Other offenders are “could you prototype this for me” followed by “go ahead and deploy that [prototype]” and that technical debt is never paid… Read more »
I definitely agree with this. I suppose I should consider the eventuality that managers short-sightedly chasing poorly designed incentives might be reading what I have to say and failing to understand that it is they, and not the developers, that are the problem. But then again, I’m not sure any caveats I offer here are likely to help the poor folks working on this software.
I think another aspect of rewriting using new cutting edge technology is that developers don’t want to maintain and write code from 1992. they want to get their hands on the latest and greatest.
If the company does not move on, good developers will leave, and the company will only have “not great” developers.
That’s definitely true. Personally, if I were (and I have been) confronted with a situation in which I was managing a team with a nasty legacy code base, I’d challenge the team to figure out how to modernize strategic parts of the app without a rewrite. Turn the situation into the kind of challenge that talented developers can go chase, with the reward for figuring it out being a more pleasant tooling arrangement.
A ground-up rewrite would take fewer total hours. But incremental refectoring lets management see what it is buying sooner in terms of retiring technical debt. It’s a waterfall versus agile thing. It should be way easier to sell refactoring, but my experience is that management won’t buy either one.
https://www.psychologytoday.com/blog/in-therapy/200907/the-definition-insanity-is
Ever heard of Netscape? Oh, not recently? They bought the farm thanks to siren’s song of a clean re-write. http://www.joelonsoftware.com/articles/fog0000000069.html
Biggest mistake you can make (right after the mistake of accumulating technical debt until your codebase is in an “upside down” mortgage).
If you want to rewrite old code, just answer one question:
“Do you really know, what the code does?”
I’m just debugging code in a real time system, i have written two years ago.
I did not really know, what the code should do and the reviewers wasn’t any better.
Now i know, but time is runnig out.
The author writes: ‘The “big rewrite from scratch because this is a mess” is a losing strategy.’ I tend to disagree. One example: Microsoft rewrote Exchange Server from scratch using C#. I think this was a wise decision. If your code is a mess, I would advise you to rewrite it from scratch, with the intent for the code to be well organised, modular, and well understood. I believe that working on and servicing messy code is a losing strategy. Another example from a different area: Architects nowadays build houses in a way that they can be cleaned easily. I… Read more »
[citation needed] – I couldn’t find any write-up about that Exchange Server rewrite. How did you learn of it?
I cannot remember where and when I first read it. Was it WindowsITPro? I am not sure.
Anyway, here is a good reference:
https://technet.microsoft.com/en-us/magazine/jj851175.aspx
Here are some more:
https://en.wikipedia.org/wiki/Microsoft_Exchange_Server
https://www.quora.com/What-is-the-technology-stack-behind-outlook-com
If you perform the following Google search:
Exchange Server rewritten in c#
you will also get links to relevant pages of the two books that write about this fact:
Pro Exchange Server 2013 Administration
Microsoft Exchange Server 2013 Inside Out Mailbox and High Availability
(the links from Google Books may be too long to include here).
Hey, sorry, I’ve had limited bandwidth for comment responses, and I’ve drafted another post on the NDepend blog to elaborate on my thoughts. But if you want to post a linq and Disqus won’t let you, email me or submit it through the “Ask Me a Question” form, and I’ll add an edit at the bottom of the post with the link.
Thank you so much, Eric! I think I will not be needing the extra space, unless Kevin Mote wants more info. Thanks, again!
I had one project where the complete rewrite was actually the way to REDUCE risk. The 3D application I had inherited was written with an engine designed to take advantage of modern video game hardware (for multiple high-performance cores and high-end graphics hardware). It barely ran at an acceptable frame-rate on Core 2 Quad machines with modern GPUs, and we were to add features and make it run on single core Celeron machine with on-board graphics – a target with less than 5% of the horsepower of the then-current target… The engine was no longer supported, and no one on… Read more »