How to Write Code that Operations Will Like
Editorial Note: I originally wrote this post for the Monitis blog. You can check out the original here, at their site. While you’re there, take a look around at the options you have for monitoring your production site and all of its supporting infrastructure.
In recent years, the DevOps movement has gained a significant amount of steam. Historically, organizations approached software creation and maintenance in the same way that they might with physical, mechanical process such as manufacturing. This meant carving the work into discrete components and then asking people to specialize in those components.
If you picture a factory floor, it becomes easy to picture. One person screws two components together over and over, while another person operates a drill press over and over. When you specialize this way, the entire process gains in efficiency. At least, so it goes with mechanical processes.
With knowledge work, we haven’t realized the same kind of benefit. In retrospect, this makes sense. It makes sense because, unlike mechanical work, knowledge work involves little true repeatability. It continually presents us with problems that differ at least somewhat.
As we’ve come to recognize that fact, we’ve begun to blend concerns together. Scrum teams blur the lines among QA, developers, and business analysts. And DevOps does the same with initial development and production operation of software. If you can, I wholeheartedly encourage you to embrace DevOps and to blend these concerns. Your entire work product will improve.
Unfortunately, not every shop can do this — at least not immediately. Progress goes slowly in these places because of regulatory and compliance concerns or perhaps because change can come slowly in the enterprise. But that shouldn’t stop you from making strides.
Even if development and operations remain separate, they can at least communicate and help each other. Today, I’d like to talk about how developers can write code that makes life easier for ops folks.
Ensure Predictable Performance
When you have a piece of software in front of users, you want it to perform well. If you have a website or offer a SaaS product, you want snappy page loads and quick request servicing. Perhaps you have a mobile app or a desktop product. In that case, you want responsiveness and no instances where the app “hangs.”
Everyone in the organization wants this, and operations presents no exception. But operations has an additional concern that most others don’t necessarily share quite as acutely. They want predictable performance.
If you have responsibility for monitoring and responding to user issues, you can, in a sense, live with sluggishness. “While we realize page load is a bit slow, our engineers know about the problem and have a fix in the works.” Not ideal, but workable.
Now imagine how they respond to an app randomly slowing to a crawl every now and then or to a site intermittently returning some 500 error after running out of memory. For operations folks, explanations and troubleshooting both become difficult.
So help them out. Keep an eye out for deadlock and race conditions and user performance profilers to help you detect memory leaks and other such insidious issues.
Generate Robust Logging
For folks in operations, log files can represent a lifeline. Amid a sea of questionable user claims and anecdotal evidence, logs provide a welcome source of objectivity. The log file does not lie.
Therefore, the more logging you do, the better. It gives them more to work with and they will thank you for it.
But they’ll only thank you for it if the information it contains actually helps. Fill the log files with detailed information, but make sure the information helps. It should provide basics like context, time-stamping, and log level. And, on top of that, you should take care to make the logs parsesable so that operations folks can use automated tools to search and filter.
Put some good thought into how you log in your application. The ops people will thank you for it.
Provide Good Hooks for Alerts and Escalation
Speaking of log files, you can demonstrate empathy for ops team members with mindfulness for how they work. Operations people need ways to structure their workflows, which means notifications of key events and escalation processes.
While they can accomplish some of this with purely external concerns, such as website monitoring, you can aid them in this cause. The log files offer an important means for doing this. If you do everything mentioned in the last section, you have a good start. But you can and should go further.
Ask the operations folks about their workflows and what they need to know. Incorporate their responses into outputs that come from your program. This might mean adding specific log entries that they can key off of. It might also come in the form of error codes returned from utilities, HTTP response codes, exception messages, events, and anything else you can think of. Make sure they have easy ways to inspect and respond to the running code.
Clean up Your Resources
Earlier, I mentioned using tools to keep code performance predictable. You want crisp and predictable performance. In a similar vein, you want to make sure your code cleans up after itself.
If you write code that leaves a file handle open or that opens database connections without closing them, you hand operations a ticking time bomb. They’ll start the thing and then, at some point, hours, days, or even weeks later, it will blow up. Because effect comes so much later than cause, troubleshooting code like this creates serious headaches.
Obviously good software practice dictates that you clean up after yourself. But the operations folks have more than professional pride at stake here.
Cohesive, Modular Code
As the last consideration, I’ll offer a slightly more philosophical suggestion. You want to write code with a high degree of cohesion and a low degree of bad coupling. As a software developer, this has the nice effect of making your code easier to maintain.
But for operations folks, it impacts their life in a slightly different way. Non-cohesive, inappropriate code tends to exhibit weird runtime behaviors. To provide an easier to visualize example, consider the metaphor of taking your car in to have the brakes fixed. Imagine if, when you left, the brakes worked, but the headlights wouldn’t turn on. Now imagine that the shop fixes the headlights, but now the stereo won’t play AM radio stations. After a few incidents like this, you’d probably stop visiting this particular shop.
Highly coupled, non-cohesive code causes your application to behave this way. Fixing one bug causes something to break in an unrelated module. It begins to feel like trying to clean your floor by pushing dirty mop water around.
Now, frustrating as it may seem for you to chase these things down, imagine being on the front lines. Imagine having to explain to users why changing the font of the login button somehow broke checking their funds balance. Don’t subject the operations people to this kind of nonsense.
Earlier in the post, I mentioned having empathy for the operations workers. Everything I’ve said here really falls under that heading, at the broadest level.
You’d have a hard time going wrong by developing and exhibiting empathy for your coworkers. But, as the DevOps movement shows us, operations and development have a uniquely intertwined and interdependent relationship. You all have responsibility for the successful creation, deployment, and operation of the software. Do your part to make their lives easier, and they’ll do their part not to wake you up at 3 AM on a Sunday.