Why Production Monitoring Can Come Too Late
Editorial Note: I originally wrote this post for the Stackify blog. You can check out the original here, at their site. While you’re there, have a look around at how their offering can help you hunt down issues from development to production.
I’ve spent a number of years, now, writing software. At the risk of dating myself, I worked on software in the early 2000s. Back then, you couldn’t take quite as much for granted. For example, while organizations considered source control a good practice, forgoing it wouldn’t have constituted lunacy the way it does today.
As a result of the different in standards, my life shipping software looked different back then. Only avant garde organizations adopted agile methodologies, so software releases happened on the order of months or years. We thus reasoned about the life of software in discrete phases. But I’m not talking about the regimented phases of the so-called “waterfall” methodology. Rather, I generalize it to these phases: build, prep, run.
During build, you mainly solved the problem of cranking through the requirements as quickly as possible. Next up, during prep, you took this gigantic sprawl of code that only worked on dev machines, and started to package it into some kind of deployable product. This might have meant early web servers or even CDs at the time. And, finally, came run. During run phase, you’d maintain vigilance, waiting for customer issues to come streaming in.
Bear in mind that we would, of course, work to minimize bugs and issues during all of these phases. But at that time with most organizations, having issues during the “run phase” constituted a good problem to have. After all, it meant you had reached the run phase. A shocking amount of software never made it that far.
Monitoring and Software Maturity
We’ve come a long way. As I alluded to earlier, you’d get some pretty incredulous looks these days for not using source control. And you would likewise receive incredulous looks for a release cycle spanning years, divided into completely disjoint phases. Relatively few shops view their applications’ production behavior as a hypothetical problem for a far-off date anymore.
We’ve arrived at this point via some gradual, hard-won victories over the years. These have addressed the phases I mentioned and merged them together. Organizations have increasingly tightened the feedback loop with the adoption of agile methodologies. Alongside that, vastly improved build and deployment tooling has transformed “the build” from “that thing we do for weeks at the end” to “that thing that happens with every commit.” And, of course, we’ve gotten much, much better at supporting software in production.
Back in the days of shrink-wrap software and shipping CDs, users reported problems via phone call. For a solution, they developed workarounds and waited for a patch CD in the mail. These days, always-connected devices allow for patches with arbitrary quickness. And we have software that gets out in front of production issues, often finding them even before users do.
Specifically, we now have sophisticated production monitoring software. In some cases, this means simply watching for outages and supplying alerts. But we also have sophisticated application performance monitoring (APM) capabilities. As I said, we’ve come a long way.