Logging for Continuous Integration
Editorial Note: I originally wrote this post for the LogEntries blog. Check out the original here, at their site. While you’re there, take a look at the product offering, which includes storage, aggregation, and sophisticated search of your log information.
If you look at the title of this post, you’re probably thinking to yourself, “huh, that’s never really come up.” Of course, it’s possible that you’re not. But, in my travels as a consultant helping dev teams with practice and gap analysis, I’ve never had anyone ask me, “what do you recommend in terms of a logging solution for continuous integration?”
But hey, this is an easily solved problem, right? After all, continuous integration means Jenkins, and Jenkins has an application log. Perhaps that’s why no one is asking about it! Now, all that’s left is to sit back and bask in the glow of every compiler warning your application has ever generated since the dawn of time.
What Actually Is Continuous Integration?
Now, I know what you’re thinking. TeamCity is another continuous integration tool, and it also has logs. Or what about TFS or Bamboo? Jenkins doesn’t have sole possession of the continuous integration mind share. There are any number of products designed for this purpose.
And thus we arrive at a popular misconception.
Continuous integration is not Jenkins. It’s not Team City. It’s not TFS or Bamboo. And it’s also not the non-empty set that results from choosing one of the tools. Continuous integration is a practice, not a tool. And it’s actually a simple practice at that.
If you go back to basics via Wikipedia, you’ll find this definition.
Continuous integration (CI) is the practice, in software engineering, of merging all developer working copies to a shared mainline several times a day.
Notice it does not say, “CI is where you hook your Github account up to Jenkins.” There is no mention of any particular tool; it just describes the idea of developers’ source code never getting very far out of sync. Cringe (appropriately), but you could just as easily achieve this by having developers collaborate using Notepad to edit source files housed on a shared Dropbox account.
Why Continuous Integration?
Once we’re back at basics, the “why” becomes interesting because it’s not a matter of simply, “Jenkins is better than not Jenkins.” And, even if you allow for the actual business case, “Jenkins eliminates time consuming manual preparation of software for production,” you’re still missing out on the original, historical idea here.
Back in the 1990s, team software development was generally structured the way that one might expect a factory to be structured. Developers went off into their cubicles, worked for months on their code, and then all met in the middle to assemble the pieces the way one would assemble something from Ikea. That was the theory, anyway.
The reality was much more grim, and it would result in extended “integration phases” where the developers would do nothing but try to mash the parts together into some kind of working product. They might spend weeks getting it to compile let alone build and run properly.
So continuous integration emerged, alongside of a pithy suggestion that, “if it hurts, you should do it more.” Forward thinking folks asked, “could we avoid all of this if we started with working software that would build, and we made sure it that continued to be true every day?” The answer to that turned out to be, “yes,” and so emphatically “yes” that later processes, tools, and business like Jenkins et. al. would be built on top of it.
But in the beginning, it was just an idea aimed at removing the sting from collaboration.
Why Log Any of This Stuff?
We’ve certainly digressed a bit from the idea of logging and into the history of and motivation for continuous integration. But there’s a method to the madness. Understanding the method and origin lets us understand what information is valuable and thus worthy of tracking.
Think back to those early days and to what it might have been interesting to know. CI being a good idea was not settled case law, so you might be asking yourself, “am I wasting time by getting a fresh copy of the source code and compiling it a few times per day?” (Bear in mind too that neither obtaining group code nor compiling was nearly as quick as it is today.) You’d know that the “merge party” preceding a release typically lasted a month, so you’d want to know that you were spending a good bit less time, cumulatively, obtaining and building source control.
Logging would be a great way to accomplish this, if the pull frequency and build times were tracked.
And the interesting things that one might measure only grew from there as more and more of the collaboration and build process expanded and grew more automated. Think of questions that might have been answered by logs during this evolution.
- How frequently do we have merge conflicts?
- How often is a pull of a modified file succeeded by a push of that same edited file (and thus saved a merge)?
- What is the average number of compiler warnings on the main branch?
- Who most often delivers build-breaking code?
- What day of the week tends to feature the most activity?
All sorts of questions that many have wondered idly, but that have real business value, could be answered by capturing the appropriate information about the team’s shared code.
Modern Motivations for Logging
And, indeed, a lot of these questions have been answered, and in very accessible ways. It’s no longer 1990, and in 2016, there is nice tooling to parse your source control tool’s version history. Commit, build, test, and deploy are all tacked on top of the continuous integration process and baked into powerful tools. Tools like, yes, Jenkins. And speaking of Jenkins and its ilk, those tools feature impressive dashboards and statistics aggregation — they have answers to the questions that people used to ask.
But you should stay ahead of dashboards. They don’t have answers to all of the questions that you might dream up, and they can’t support all of your hypotheses about how to improve your team’s collaboration and work product management.
To stay on the forefront, you have to go back to the logs. You could, of course, send the application logs to LogEntries for storage and querying. You could also use the tooling that has emerged around making sense of these application logs, such as this Jenkins Log Parser. Or, if you have time to kill, you could even roll your own.
But whatever you do, make sure you capture this information and give yourself access to it. Your build process is the lifeblood of your operational efficiency and collaboration. Shouldn’t you have answers to questions as quickly as you can think of them?