Integrating APM into Your Testing Strategy
Editorial note: I originally wrote this post for the Stackify blog. You can check out the original here, at their site. While you’re there, have a look at their tooling to help you with your APM needs.
Does your team have a testing strategy? In 2017, I have a hard time imagining that wouldn’t at least have some kind of strategy, however rudimentary. Unlike a couple of decades ago, you hear less and less about people just changing code on the production server and hoping for the best.
At the very least, you probably have a QA group, or at least someone who serves in that role prior to shipping your software. You write the code, do something to test it, and then ship it once the testers bless it (or at least notate “known issues”).
From there, things probably run the gamut among those of you reading. Some of you probably do what I’ve described and little more. Some of you probably have multiple pre-production environments to which a continuous integration setup automatically deploys builds. Of course, it only deploys those builds assuming all automated unit, integration and smoke tests pass and assuming that your static analysis doesn’t flag any show stopper issues. Once deployed, a team of highly skilled testers perform exploratory testing. Or, maybe, you do something somewhere in between.
But, whatever you do, you can always do more. In fact, I encourage you always to look for new ways to test. And today I’d like to talk about an idea for just such a thing. Specifically, I think you can leverage application performance management (APM) software to help your testing efforts. I say this in spite of the fact that most shops have traditionally taken advantage of these tools only in production.
What You Need to Get Started
First things first. You can’t just download or purchase a tool and head off to the races. You need a least a certain level of sophistication in your existing setup. And, if you lack this sophistication, I suggest that you acquire it.
You’re going to need a non-production environment in which to test your application. Generally speaking, this means a server (or multiple servers) on which you simulate your production environment. You don’t need to simulate all production conditions, necessarily. But you can’t reasonably do this on Steve’s development box because he has the only desktop with 16 gigs of memory. You’re going to need to up your game a little beyond that.
You’ll also need for this environment to include reasonable facsimiles of externalities to your application. Most commonly, this means a database, but it also might involve things like (non-production) endpoints web service calls. You need to recreate production, both in terms of the deployed environment and the things it depends on.
What You Really Ought to Have
From there, you probably want to do a few more things to make this work. I strongly suggest automating the process taking committed code to deployment in your testing environment(s). You can get by without it, but a manual process dissuades people from executing it frequently. And, without frequent execution, you lose early feedback and a good bit of value.
The same reasoning applies to actually exercising the application. You could deploy it to the test environment and then do something like having a whole bunch of people log in and start banging around at things. But you’d do a lot better to automate this in order to simulate traffic. Generally, you can think of this as an automated smoke test. And doing it will put stress on the system in a way that manual testers probably can’t and certainly won’t. It will also ensure a consistent and methodical approach that you can repeat to troubleshoot.
With all of that squared away, you can start to take advantage of APM in your testing strategy.
First Steps and Getting a Feel for Things
In my consulting, I advocate that people do what they can do tighten feedback loops. Plan a small slice of value in what you’re doing, execute on it, notch a quick win, and go from there. I advocate this same approach here.
Go get an APM tool and set it up in your testing environment. Play with it and get a feel for its features and how it works. See what sorts of data it furnishes and how that changes under different circumstances.
You’re basically just seeing what it will tell you about your application at this point. If you’ve never really profiled your code before, I can almost promise you that it will find things that astonish you. Sure, you knew the call to get all customer records was slow, but you just assumed it was a network issue or something. You had no idea that Customer data transfer object had a Thread.Sleep(1000) in its constructor. (Okay, you probably won’t find yourself quite that astonished, but you never know.)
Your manual efforts alone will likely result in a robust set of technical debt backlog items to address. You’ll have queries to improve, modules that spew swallowed exceptions to clean up, and a good bit of general housekeeping.
Move On to Automation
Once you’ve recovered from your initial backlog with the tool by fixing things, you want to make sure you don’t back slide. Take what you’ve learned and turn the captured metrics into thresholds.
Did some kind of rogue query result in an order of magnitude delay? By all means, fix the query, but carry that learning forward. Set a threshold a bit above the proper time for the operation, and instrument your build pipeline to notify you if performance exceeds the threshold. Better yet, create some kind of failure that the team can’t ignore.
Assuming you’ve taken my advice, you now have a pretty powerful end to end setup. Your commits trigger automated builds. These, in turn, trigger deployments to a testing environment and subject the application to all manner of testing — including smoke tests. These smoke tests stimulate the application into situations where APM will detect issues. When it does, you’ll become aware of them with a significantly tighter feedback loop than a user calling the help desk to ask why things are so slow.
Keep the Pressure On
Once you’ve operationalized this whole approach, don’t let yourself get complacent. In situations where performance means competitive advantage, keep dialing the APM tool thresholds to better and better figures. Set goals with each release for improvement.
And that’s just a single example. If you’ve set goals for other concerns or forms of research usage, keep poking at those for improvement as well. And keep dialing up the volume on the smoke tests if you can. The closer you can run to a stressed production environment, the better you can sleep at night when you actually ship.
Keep Things in Perspective
I’ll close by offering a note of caution based on field experience. When you start to incorporate sophisticated techniques into your testing strategy, you experience a heady feeling. You’ll acquire an unprecedented sense of confidence in your code and its performance. And justifiably so.
But take care that you don’t allow that confidence to slide into overconfidence. Using APM can expose issues other forms of testing can’t easily expose. It lets you simulate the vagaries of production and then take steps to guard against them. But note that I said simulate. It simulates production, but it doesn’t recreate it.
So as you incorporate APM into your testing strategy bear two things in mind. First, you also definitely need to monitor production itself. And secondly, no testing strategy, however sophisticated, can give you 100% confidence. Incorporate APM into your strategy, but never stop looking for other ways to improve it.