DaedTech

Stories about Software

By

Creating a Word Document from Code with Spire

I’d like to tell you a harrowing, cautionary tale of my experience with the MS Office Interop libraries and then turn it into a story of redemption. Just to set the stage, these interop libraries are basically a way of programatically creating and modifying MS Office files such as Word documents and Excel spreadsheets. The intended usage of these libraries is in a desktop environment from a given user account in the user space. The reason for this is that what they actually do is launch MS Word and start piping commands to it, telling it what to do to the current document. This legacy approach works reasonably well, albeit pretty awkwardly from a user account. But what happens when you want to go from a legacy Winforms app to a legacy Webforms app and do this on a web server?

Microsoft has the following to say:

Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment.

Microsoft says, “yikes, don’t do that, and if you do, caveat emptor.” And, that makes sense. It’s not a great idea to allow service processes to communicate directly with Office documents anyway because of the ability to embed executable code in them.

Sometime back, I inherited an ecosystem of legacy Winforms and Webforms applications and one common thread was the use of these Interop libraries in both places. Presumably, the original author wasn’t aware of Microsoft’s stance on this topic and had gone ahead with using Interop on the web server, getting it working for the moment. I didn’t touch this legacy code since it wasn’t causing any issues, but one day a server update came down the pipeline and *poof* no more functioning Interop. This functionality was fairly important to people, so my team was left to do some scrambling to re-implement the functionality using PDF instead of MS Word. It was all good after a few weeks, but it was a stressful few weeks and I developed battle scars around not only doing things with those Interop libraries and their clunky API (see below) but with automating anything with Office at all. Use SSRS or generate a PDF or something. Anything but Word!

Interop

But recently I was contacted by E-iceblue, who makes document management and conversion software in the .NET space. They asked if I’d take a look at their offering and write-up my thoughts on it. I agreed, as I do agree to requests like this from time to time, but always with the caveat that I’ll write about my experience in earnest and not serve as a platform for a print-based commercial. Given my Interop horror story, the first thing I asked was whether the libraries could work on a server or not, and I was told that they could (presumably, although I haven’t verified, this is because they use the Open XML format rather than the legacy Interop paradigm). So, that was already a win.

I put this request in my back pocket for a bit because I’m already pretty back-logged with post requests and other assorted work, but I wound up having a great chance to try it out. I have a project up on Github that I’ve been pushing code to in order to help with my Pluralsight course authorship. Gist of it is that I create a directory and file structure for my courses as I work on them, and then another for submission, and I want to automate the busy-work. And one thing that I do for every module of every course is create a power point document and a word document for scripting. So, serendipity — I had a reason to generate word documents and thus to try out E-iceblue’s product, Spire.

Rather than a long post with screenshots and all of that, I did a video capture. I want to stress that what you’re seeing here is me, having no prior knowledge of the product at all, and armed only with a link to tutorials that my contact there sent to me. Take a look at the video and see what it’s like. I spend about 4-5 minutes getting setup and, at the end of it, I’m using a nice, clean API to successfully generate a Word document.

I’ll probably have some more posts in the hopper with this as I start doing more things with it (Power Point, more complex Word interaction, conversion, etc). Early returns on this suggest it’s worth checking out, and, as you can see, the barriers to entry are quite low, and I’ve barely scratched the surface of just one line of offerings.

By

TDD Chess Game Part 7: Cleaning up the Bishop

I bet you thought I’d forgotten this series and the video-cast leg of my burgeoning multimedia empire, but fear not, because I’m back following a long vacation, bookended by content publication announcements. I took previous feedback to heart and increased the font size of the spreadsheet as I worked, and decided to make this video a little shorter. Trending closer to the 10 minute mark than 20 is partly selfish because the 20 minute videos take an absolutely staggering amount of time to produce in my video editing software, but I also think it might be more consumable this way. So, here’s what I accomplish:

  • Extended test coverage on Bishop.GetMovesFrom() a bit.
  • Cleaned up Bishop.GetMovesFrom().
  • Moved Bishop to its own file.

And here are lessons to take away:

  • It’s fun to be cute with your todo list, but the effect wears off after a break — better perhaps to keep it simple and descriptive.
  • When you see duplication or repetition of patterns, ask yourself what’s the variable or variables in the repetition and see if you can parameterize it for an extracted method.
  • Use the refactor phase to ask yourself, “how would this method read to someone who’d never seen this code before,” and to focus on how to make it clearer.
  • This is pretty heavy on personal opinion, but I think favoring declarative (functional, Linq-y stuff) over imperative (loops and control flow) semantics promotes readability.

By

TDD Chess Game Part 6: Starting with More Pieces

I recorded this clip at a home, where I have a pretty old dual core processor that’s really starting to show its age when I have the video capture software running alongside VS2013 with all of my plugins. So, if you see the recording looking a little choppy in places, it’s because I edited out some places where I finished writing a test and was just drumming my fingers waiting for the test to go green or something. No need to show you completely dead time. Note to self — probably time for a new machine here soon.

Here’s what I accomplish in this clip:

  • Pretty up Rook.GetMovesFrom() a bit with Linq via refactoring.
  • Added Bishop class.
  • Implemented Bishop.GetMovesFrom()

Here are some lessons to take away:

  • You can write a test that’s green from the get-go if you want another data point prior to a refactoring.  Strict TDD says you need a failing test before you modify production code, but you can write green tests to your heart’s content.
  • Sometimes inheriting from a base class or implementing an interface is the quickest way to get from non-compiling because of an undefined method to having that method defined.
  • Just stubbing out a test method doesn’t constitute a failing test.  As long as everything is green, refactoring is fine.
  • I personally find that there are occasions when extracting a method or defining a common method is actually the simplest way to get something to pass.  So, even though that seems really like a refactoring, it can be your simplest path to get a red test green because it saves re-coding the same thing repetitively.
  • It’s perfectly fine to leave a working, but ugly, method and come back the next day/session to refactor.  Often times a much more elegant implementation will hit you in the shower or on the drive to your office, and you’ll make short of cleaning up the method when you have this fresh perspective.

And, the video:

By

Introduction to Static Analysis (A Teaser for NDepend)

Rather than the traditional lecture approach of providing an official definition and then discussing the subject in more detail, I’m going to show you what static analysis is and then define it. Take a look at the following code and think for a second about what you see. What’s going to happen when we run this code?

private void SomeMethod()
{
	int x = 1;
	if(x == 1)
		throw new Exception();
}

Well, let’s take a look:

Exception

I bet you saw this coming. In a program that does nothing but set x to 1, and then throw an exception if x is 1, it isn’t hard to figure out that the result of running it will be an unhandled exception. What you just did there was static analysis.

Static analysis comes in many shapes and sizes. When you simply inspect your code and reason about what it will do, you are performing static analysis. When you submit your code to a peer to have her review, she does the same thing. Like you and your peer, compilers perform static analysis, though automated analysis instead of manual. They check the code for syntax errors or linking errors that would guarantee failures, and they will also provide warnings about potential problems such as unreachable code or assignment instead of evaluation. Products also exist that will check your source code for certain characteristics and stylistic guideline conformance rather than worrying about what happens at runtime and, in managed languages, products exist that will analyze your compiled IL or byte code and check for certain characteristics. The common thread here is that all of these examples of static analysis involve analyzing your code without actually executing it.

Analysis vs Reactionary Inspection

People’s interactions with their code tend to gravitate away from analysis. Whether it’s unit tests and TDD, integration tests, or simply running the application to see what happens, programmers tend to run experiments with their code and then to see what happens. This is known as a feedback loop, and programmers use the feedback to guide what they’re going to do next. While obviously some thought is given to what impact changes to the code will have, the natural tendency is to adopt an “I’ll believe it when I see it” mentality.

private void SomeMethod()
{
	var randomGenerator = new Random();
	int x = randomGenerator.Next(1, 10);
	Console.WriteLine(x);
}

We tend to ask “what happened?” and we tend to orient our code in such ways as to give ourselves answers to that question. In this code sample, if we want to know what happened, we execute the program and see what prints. This is the opposite of static analysis in that nobody is trying to reason about what will happen ahead of time, but rather the goal is to do it, see what the outcome is, and then react as needed to continue.

Reactionary inspection comes in a variety of forms, such as debugging, examining log files, observing the behavior of a GUI, etc.

Static vs Dynamic Analysis

The conclusions and decisions that arise from the reactionary inspection question of “what happened” are known as dynamic analysis. Dynamic analysis is, more formally, inspection of the behavior of a running system. This means that it is an analysis of characteristics of the program that include things like how much memory it consumes, how reliably it runs, how much data it pulls from the database, and generally whether it correctly satisfies the requirements are not.

Assuming that static analysis of a system is taking place at all, dynamic analysis takes over where static analysis is not sufficient. This includes situations where unpredictable externalities such as user inputs or hardware interrupts are involved. It also involves situations where static analysis is simply not computationally feasible, such as in any system of real complexity.

As a result, the interplay between static analysis and dynamic analysis tends to be that static analysis is a first line of defense designed to catch obvious problems early. Besides that, it also functions as a canary in the mine to detect so-called “code smells.” A code smell is a piece of code that is often, but not necessarily, indicative of a problem. Static analysis can thus be used as an early detection system for obvious or likely problems, and dynamic analysis has to be sufficient for the rest.

Canary

Source Code Parsing vs. Compile-Time Analysis

As I alluded to in the “static analysis in broad terms” section, not all static analysis is created equal. There are types of static analysis that rely on simple inspection of the source code. These include the manual source code analysis techniques such as reasoning about your own code or doing code review activities. They also include tools such as StyleCop that simply parse the source code and make simple assertions about it to provide feedback. For instance, it might read a code file containing the word “class” and see that the next word after it is not capitalized and return a warning that class names should be capitalized.

This stands in contrast to what I’ll call compile time analysis. The difference is that this form of analysis requires an encyclopedic understanding of how the compiler behaves or else the ability to analyze the compiled product. This set of options obviously includes the compiler which will fail on show stopper problems and generate helpful warning information as well. It also includes enhanced rules engines that understand the rules of the compiler and can use this to infer a larger set of warnings and potential problems than those that come out of the box with the compiler. Beyond that is a set of IDE plugins that perform asynchronous compilation and offer realtime feedback about possible problems. Examples of this in the .NET world include Resharper and CodeRush. And finally, there are analysis tools that look at the compiled assembly outputs and give feedback based on them. NDepend is an example of this, though it includes other approaches mentioned here as well.

The important compare-contrast point to understand here is that source analysis is easier to understand conceptually and generally faster while compile-time analysis is more resource intensive and generally more thorough.

The Types of Static Analysis

So far I’ve compared static analysis to dynamic and ex post facto analysis and I’ve compared mechanisms for how static analysis is conducted. Let’s now take a look at some different kinds of static analysis from the perspective of their goals. This list is not necessarily exhaustive, but rather a general categorization of the different types of static analysis with which I’ve worked.

  • Style checking is examining source code to see if it conforms to cosmetic code standards
  • Best Practices checking is examining the code to see if it conforms to commonly accepted coding practices. This might include things like not using goto statements or not having empty catch blocks
  • Contract programming is the enforcement of preconditions, invariants and postconditions
  • Issue/Bug alert is static analysis designed to detect likely mistakes or error conditions
  • Verification is an attempt to prove that the program is behaving according to specifications
  • Fact finding is analysis that lets you retrieve statistical information about your application’s code and architecture

There are many tools out there that provide functionality for one or more of these, but NDepend provides perhaps the most comprehensive support across the board for different static analysis goals of any .NET tool out there. You will thus get to see in-depth examples of many of these, particularly the fact finding and issue alerting types of analysis.

A Quick Overview of Some Example Metrics

Up to this point, I’ve talked a lot in generalities, so let’s look at some actual examples of things that you might learn from static analysis about your code base. The actual questions you could ask and answer are pretty much endless, so this is intended just to give you a sample of what you can know.

  • Is every class and method in the code base in Pascal case?
  • Are there any potential null dereferences of parameters in the code?
  • Are there instances of copy and paste programming?
  • What is the average number of lines of code per class? Per method?
  • How loosely or tightly coupled is the architecture?
  • What classes would be the most risky to change?

Believe it or not, it is quite possible to answer all of these questions without compiling or manually inspecting your code in time consuming fashion. There are plenty of tools out there that can offer answers to some questions like this that you might have, but in my experience, none can answer as many, in as much depth, and with as much customizability as NDepend.

Why Do This?

So all that being said, is this worth doing? Why should you watch the subsequent modules if you aren’t convinced that this is something that’s even worth learning. It’s a valid concern, but I assure you that it is most definitely worth doing.

  • The later you find an issue, typically, the more expensive it is to fix. Catching a mistake seconds after you make it, as with a typo, is as cheap as it gets. Having QA catch it a few weeks after the fact means that you have to remember what was going on, find it in the debugger, and then figure out how to fix it, which means more time and cost. Fixing an issue that’s blowing up in production costs time and effort, but also business and reputation. So anything that exposes issues earlier saves the business money, and static analysis is all about helping you find issues, or at least potential issues, as early as possible.
  • But beyond just allowing you to catch mistakes earlier, static analysis actually reduces the number of mistakes that happen in the first place. The reason for this is that static analysis helps developers discover mistakes right after making them, which reinforces cause and effect a lot better. The end result? They learn faster not to make the mistakes they’d been making, causing fewer errors overall.
  • Another important benefit is that maintenance of code becomes easier. By alerting you to the presence of “code smells,” static analysis tools are giving you feedback as to which areas of your code are difficult to maintain, brittle, and generally problematic. With this information laid bare and easily accessible, developers naturally learn to avoid writing code that is hard to maintain.
  • Exploratory static analysis turns out to be a pretty good way to learn about a code base as well. Instead of the typical approach of opening the code base in an IDE and poking around or stepping through it, developers can approach the code base instead by saying “show me the most heavily used classes and which classes use them.” Some tools also provide visual representations of the flow of an application and its dependencies, further reducing the learning curve developers face with a large code base.
  • And a final and important benefit is that static analysis improves developers’ skills and makes them better at their craft. Developers don’t just learn to avoid mistakes, as I mentioned in the mistake reduction bullet point, but they also learn which coding practices are generally considered good ideas by the industry at large and which practices are not. The compiler will tell you that things are illegal and warn you that others are probably errors, but static analysis tools often answer the question “is this a good idea.” Over time, developers start to understand subtle nuances of software engineering.

There are a couple of criticisms of static analysis. The main ones are that the tools can be expensive and that they can create a lot of “noise” or “false positives.” The former is a problem for obvious reasons and the latter can have the effect of counteracting the time savings by forcing developers to weed through non-issues in order to find real ones. However, good static analysis tools mitigate the false positives in various ways, an important one being to allow the shutting off of warnings and the customization of what information you receive. NDepend turns out to mitigate both: it is highly customizable and not very expensive.

Reference

The contents of this post were mostly taken from a Pluralsight course I did on static analysis with NDepend. Here is a link to that course. If you’re not a Pluralsight subscriber but are interested in taking a look at the course or at the library in general, send me an email to erik at daedtech and I can give you a 7 day trial subscription.

By

Chess TDD 5: Bounded Collections of Moves

Really no housekeeping or frivolous notes for this entry.  It appears as though I’ve kind of settled into a groove with the production, IDE settings, etc.  So from here on in, it’s just a matter of me coding in 15-20 minute clips and explaining myself, notwithstanding any additional feedback, suggestions or questions.

Here’s what I accomplish in this clip:

  • Change all piece GetMovesFrom methods to use BoardCoordinate type.
  • Got rid of stupid Rook implementation of GetMovesFrom()
  • Made a design decision to have GetMovesFrom take a “board size parameter.”
  • Rook.GetMovesFrom() is now correct for arbitrary board sizes.
  • Updated Rook.GetMovesFrom() to use 1 indexing instead of 0 indexing to accommodate the problem space.
  • Removed redundant intsantiation logic in RookTest
  • Got rid of redundant looping logic in Rook.GetMovesFrom()

Here are some lessons to take away:

  • Sometimes you create a failing test and getting it to pass leads you to changing production code which, in turn, leads you back into test code.  That’s okay.
  • Sometimes you’re going to make design decisions that aren’t perfect, but that you feel constitute improvements in order to keep going.  Embrace that.  Your tests will ensure that it’s easy to change your design later, when you understand the problem better.  Just focus on constant improvements and don’t worry about perfection.
  • “Simplest to get tests passing” is a subjective heuristic to keep you moving.  If you feel comfortable writing a loop instead of a single line or something because that seems simplest to you, you have license to do that…
  • But, as happened to me, getting too clever all in one shot can lead to extended debug times that cause flow interruptions.
  • Duplication in tests is bad, even just a little.
  • It can sometimes be helpful to create constants or readonly properties for common test inputs and descriptive names.  This eliminates duplication while promoting readability of tests.