There’s no doubt about it: I’m locked in a mortal standoff with verbosity, and I’m not winning. Every now and then I go meta with this and vow to make some smaller posts to keep the cadence at around 3 per week, and mostly I don’t. But I like to think that these travel sized posts are coming at least slightly more often.
I was poking around for some OCR software that would take scanned PDFs and spit out text, so I took to google. I wound up here. Ugh, next. Then I found this, which I downloaded. The code sample wasn’t helpful, but I poked around with Intellisense until I had something that should have worked. I got a weird exception, which turned out to be because I need to target x64 (jeez, my .NET is getting rusty), and then I got another exception because I needed a license, which required going to their site, filling out a… ugh, screw it, uninstall, next. Then I wound up here, got what I wanted in the small, and decided to call it a night.
Now, I’d been looking for an API that I could hit and preferably control (i.e. as a compiled library and not a service), but you know what? The first thing I’m going to research if I pick this up again is whether I could pay the good folks at that last site for an API somewhere, even as a web service. That’s how powerful it is to offer simplicity. They’re not even offering what I want, exactly, but they’re offering it in such dead simple fashion that I get a win in 30 seconds.
The RTFM mentality is simply dead. You can stamp your foot and scream that at me all you want.
My response will be to blink at you in amusement, shrug, and walk away to look for someone that wants my money and my business.
I have a relatively uncomplicated thought for this Wednesday and, as such, it should be a reasonably sized post. I’d like to address what I consider to be perhaps the most relatable and understandable anti-pattern that I’ve encountered in this field, and I call it the “Synthesize the Experts” anti-pattern. I’m able to empathize so intensely with its commission because (1) I’ve done it and (2) the mentality that drives it is the polar opposite of the mentality that results in the Expert Beginner. It arises from seeking to better yourself in a a vacuum of technical mentorship.
Let’s say that you hire on somewhere to write software for a small business and you’re pretty much a one-person show. From a skill development perspective, the worst thing you could possibly do is make it up as you go and assume that what you’re doing makes sense. In other words, the fact that you’re the least bad at software in your company doesn’t mean that you’re good at it or that you’re really delivering value to the business; it only means that it so happens that no one around can do a better job. Regardless of your skill level, developing software in a vacuum is going to hurt you because you’re not getting input and feedback and you’re learning only at the pace of evolutionary trial and error. If you assume that reaching some sort of equilibrium in this sense is mastery, you’re only compounding the problem.
But what if you’re aware of this limitation and you throw open the floodgates for external input? In other words, you’re alone and you know that better ways to do it exist, so you seek them out to the best of your ability. You read blog posts, follow prominent developers, watch videos from conferences, take online learning courses, etc. Absent a localized mentor or even partner, you seek unidirectional wisdom from outside sources. Certainly this is better than concluding that you’ve nailed it on your own, but there’s another problem brewing here.
Specifically, the risk you’re running is that you may be trying to synthesize from too many disparate and perhaps contradictory sources. Most of us go through secondary and primary education with a constant stream of textbooks that are more or less cut and dry matters of fact and cannon. The more Algebra and Geometry textbooks you read and complete exercises for, the better you’re going to get at Algebra and Geometry. But when you apply this same approach to lessons gleaned from the movers and shakers in the technosphere, things start to get weird, quickly.
To put a specific hypothetical to it, imagine you’ve just read an impassioned treatise from one person on why it’s a complete violation of OOP to have “property bag” classes (classes that have only accessors and mutators for fields and no substantive methods). That seems to make sense, but you’re having trouble reconciling it with a staunch proponent of layered architecture who says that you should only communicate between layers using interfaces and/or “DTO” classes, which are property bags. And, hey, don’t web services do most of their stuff with property bags… or something? Ugh…
But why ugh? If you’re trying to synthesize these opinions, you’re probably saying, “ugh” because you’ve figured out that your feeble brain can’t reconcile these concepts. They seem contradictory to you, but the experts must have them figured out and you’re just failing to incorporate them all elegantly. So what do you do? Well, perhaps you define some elaborate way to get all of your checked in classes to have data and behavior and, at runtime to communicate, they use an elaborate reflection scheme to build and emit, on the fly, property bag DTO classes for communication between layers and to satisfy any web service consumers. Maybe they do this by writing a text file to disk, invoking a compiler on it and calling into the resulting compiled product at runtime. Proud of this solution, you take to a Q&A forum to show how you’ve reconciled all of this in the most elegant way imaginable, and you’re met with virtual heckling and shaming from resident experts. Ugh… You failed again to make everyone happy, and in the process you’ve written code that seems to violate all common sense.
The problem here is the underlying assumption that you can treat any industry software developer’s opinion on architecture, even those of widely respected developers, as canonical. Statements like, “you shouldn’t have property bags” or “application layers should only communicate with property bags” aren’t the Pythagorean Theorem; they’re more comparable to accomplished artists saying, “right triangles suck and you shouldn’t use them when painting still lifes.” Imagine trying to learn to be a great artist by rounding up some of the most opinionated, passionate, and disruptive artists in the industry and trying to mash all of their “how to make great, provocative art” advice into one giant chimera of an approach. It’d be a disaster.
I’m not attempting a fatalistic/relativistic “no approach is better than any other” argument. Rather I’m saying that there is healthy disagreement among those considered experts in the industry about the finer points of how best to approach creating software. Trying to synthesize their opinionated guidance on how to approach software into one, single approach will tie you in knots because they are not operating from a place of universal agreement. So when you read what they’re saying and ask their advice, don’t read for Gospel truth. Read instead to cherry pick ideas that make sense to you, resonate with you, and advance your understanding.
So don’t try to synthesize the experts and extract a meeting that is the sum of all of the parts. You’re not a vacuum cleaner trying to suck every last particle off of the carpet; you’re a shopper at a bazaar, collecting ideas that will complement one another and look good, put together and arranged, when you get home. But whatever you do, don’t stop shopping — everything goes to Expert-Beginner land when you say, “nah, I’m good with what’s in my house.”
I think that I’m probably going to take a good bit of flack for this post, but you can’t win ’em all. I’m interested in contrary opinions and arguments because my mind could be changed. Nevertheless, I’ve been unable to shake the feeling for months that code generation is just a basic and fundamental design failure. I’ve tried. I’ve thought about it in the shower and on the drive to work. I’ve thought about it while considering design approaches and even while using it (in the form of Entity Framework). And it just feels fundamentally icky. I can’t shake the feeling.
Let me start out with a small example that everyone can probably agree on. Let’s say that you’re writing some kind of GUI application with a bunch of rather similar windows. And let’s say that mostly what you do is take all of presentation logic for the previous window, copy, paste and adjust to taste for the next window. Oh noes! We’re violating the DRY principle with all of that repetition, right?
What we should be doing instead, obviously, is writing a program that duplicates the code more quickly. That way you can crank out more windows much faster and without the periodic fat-fingering that was happening when you did it manually. Duplication problem solved, right? Er, well, no. Duplication problem automated and made worse. After all, the problem with duplicate code is a problem of maintenance more than initial push. The thing that hurts is later when something about all of that duplicated code has to be changed and you have to go find and do it everywhere. I think most reading would agree that code generation is a poor solution to the problem of copy and paste programming. The good solution is a design that eliminates repetition and duplication of knowledge.
I feel as though a lot of code generation that I see is a prohibitive micro-optimization. The problem is “I have to do a lot of repetitive coding” and code generation solves this problem by saying, “we’ll automate that coding for you.” I’d rather see it solved by saying, “let’s step back and figure out a better approach — one in which repetition is unnecessary.” The automation approach puts a band-aid on the wound and charges ahead, risking infection.
For instance, take the concept of List in C#. List is conceptually similar to an array, but it automatically resizes, thus abstracting away an annoying detail of managing collections in languages from days gone by. I’m writing a program and I think I want an IntList, which is a list of integers. That’s going along swimmingly until I realize that I need to store some averages in there that might not be round numbers, so I copy the source code IntList to DoubleList and I do a “Find-And-Replace” with Int and Double. Maybe later I also do that with string, and then I think, “geez — someone should write a program that you just tell it a type and it generates a list type for it.” Someone does, and then life is good. And then, later, someone comes along with the concept of generics/templates and everyone feels pretty sheepish about their “ListGenerator” programs. Why? Because someone actually solved the core problem instead of coming up with obtuse, brute-force ways to alleviate the symptoms.
And when you pull back and think about the whole idea of code generation, it’s fairly Rube-Goldbergian. Let’s write some code that writes code. It makes me think of some stoner ‘brainstorming’ a money making idea:
I realize that’s a touch of hyperbole, but think of what code generation involves. You’re going to feed code to a compiler and then run the compiled program which will generate code that you feed to the compiler, again, that will output a program. If you were to diagram that out with a flow chart and optimize it, what would you do? Would you get rid of the part where it went to the compiler twice and just write the program in the first place? (I should note that throughout this post I’ve been talking about this circular concept rather than, say, the way ASP or PHP generate HTML or the way Java compiles to bytecode — I’m talking about generating code at the same level of abstraction.)
The most obvious example I can think of is the aforementioned Entity Framework that I use right now. This is a framework utility that uses C# in conjunction with a markup language (T4) to generate text files that happen to be C# code. It does this because you have 100 tables in your database and you don’t want to write data transfer objects for all of them. So EF uses reflection and IQuerable with its EDMX to handle the querying aspect (which saves you from the fate we had for years of writing DAOs) while using code generation to give you OOP objects to represent your data tables. But really, isn’t this just another band-aid? Aren’t we really paying the price for not having a good solution to the Impedance Mismatch Problem?
I feel a whole host of code gen solutions is also born out of the desire to be more performant. We could write something that would look at a database table and generate, on the fly, using reflection, a CRUD form at runtime for that table. The performance would be poor, but we could do it. However, confronted with that performance, people often say, “if only there were a way to automate the stuff we want but to have the details sorted out at compile time rather than runtime.” At that point the battle is already won and the war already lost, because it’s only a matter of time until someone writes a program whose output is source code.
I’m not advocating a move away from code generating, nor am I impugning anyone for using it. This is post more in the same vein as ones that I’ve written before (about not using files for source code and avoiding using casts in object oriented languages). Code generation isn’t going anywhere anytime soon, and I know that I’m not even in a position to quit my reliance on it. I just think it’s time to recognize it as an inherently flawed band-aid rather than to celebrate it as a feat of engineering ingenuity.
By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.
The other day, someone came to me and told me that a component of one of the web applications that my team maintains seemed to have a problem. I sat with her, and she showed me what was going on. Sure enough, I saw the issue too. It was the kind of integration thing for which we were able to muster up some historical record and see approximately when the problem had started. Apparently, the problem first occurred around the last time we pushed a version of the website into production. Ruh roh.
Given that this was a pretty critical issue, I got right down to debugging. Pretty quickly, I found the place in the code where the integration callout was happening, and I stepped through it in the debugger. (As an aside, I realize I’ve often made the case against much debugger use. But when legacy code has no unit, component or integration tests, you really don’t have a lot of options.) No exceptions were thrown, and no obvious problems were occurring. It just hit a third party API, went through it uneventfully, but quietly failed.
At this point, it was time for some tinkering and reverse engineering. When I looked at the method call that seemed to be the meat of the issue, I noticed that it returned an integer that the code I was debugging just ignored. Hovering over it, XML doc comment engine told me that it was returning an error code. I would have preferred an exception, but whatever — this was progress. Unfortunately, that was all the help I got, and there was no indication what any returned error code meant. I ran the code and saw that I was getting a “2,” so presumably there was an error occurring.
Maddeningly, there was no online documentation of this API. The only way I was able to proceed was to install a trial version of this utility locally, on my desktop, and read the CHM file. They offered one through their website, but it was broken. After digging, I found that this error code meant “call failure or license file missing.” Hmm… license file, eh? I started to get that tiny adrenaline rush that you get when a solution seems like it might be just around the corner. I had just replaced the previous deployment process of “copy over the files that are different to the server” with a slightly less icky “wipe the server’s directory and put our stuff there.” It had taken some time to iron out failures and bring all of the dependencies under source control, but I viewed this as antiseptic on a festering sore. And, apparently, I had missed one. Upon diving further into the documentation, I saw that it required some weirdly-named license file with some kind of key in it to be in the web application root’s “bin” folder on the server, or it would just quietly fail. Awesome.
This was confirmed by going back to a historical archive of the site, finding that weird file, putting it into production and observing that the problem was resolved. So time to call it a day, right?
Fixing the Deeper Issue
Well, if you call it a day now, there’s a good chance this will happen again later. After all, the only thing that will prevent this after the next deployment is someone remembering, “oh, yeah, we have to copy over that weird license file thing into that directory from the previous deploy.” I don’t know about you, but I don’t really want important system functionality hinging on “oh, yeah, that thing!”
What about a big, fat comment in the code? Something like “this method call will fail if license file xyz isn’t in abc directory?” Well, in a year when everyone has forgotten this and there’s a new architect in town, that’ll at least save a headache next time this issue occurs. But this is reactionary. It has the advantage of not being purely tribal knowledge, but it doesn’t preemptively solve the problem. Another idea might be to trap error codes and throw an exception with a descriptive message, but this is just another step in making the gap between failure and resolution a shorter one. I think we should try avoid failing at all, though having comments and better error trapping is certainly a good idea in addition to whatever better solution comes next.
What about checking the license file into source control and designing the build to copy it to that directory? Win, right? Well, right — it solves the problem. With the next deploy, the license file will be on the server, and that means this particular issue won’t occur in the future. So now it must be time to call it day, right?
Still no, I’d argue. There’s work to be done, and it’s not easy or quick work. Because what needs to happen now is a move from a “delete the contents of the server directory and unzip the new deliverable” deployment to an automated build and deployment. What also needs to happen is a series of automated acceptance tests in a staging environment and possibly regression tests in a production environment. In this situation, not only are developers reading the code aware of the dependency on that license file, not only do failures happen quickly and obviously, and not only is the license file deployed to where it needs to go, but if anything ever goes wrong, automated notifications will occur immediately and allow the situation to be corrected.
It may seem pretty intense to set all that up, but it’s the responsible thing to do. Along the spectrum of some “deployment maturity model,” tweaking things on the server and forgetting about it is whatever score is “least mature.” What I’m talking about is “most mature.” Will it take some doing to get there and probably some time? Absolutely. Does that mean that anything less can be good enough? Nope. Not in my opinion, anyway.
To start off the week, I have a satirical post about projects developed using the waterfall ‘methodology.’ (To understand the quotes, please see my post on why I don’t think waterfall is actually a methodology at all). I figured that since groups that use agile approaches and industry best practices have a whole set of xDD acronyms, such as TDD, BDD, and DDD, waterfall deserved a few of its own. So keep in mind that while this post is intended to be funny, I think there is a bit of relevant commentary to it.
Steinbeck-Driven Development (SDD)
For those of you who’ve never had the pleasure to read John Steinbeck’s “Of Mice and Men,” any SDD practitioner will tell you that it’s a heartwarming tale of two friends who overcome all odds during the Great Depression, making it cross-country to California to start a rabbit petting zoo. And it’s that outlook on life that they bring to the team when it comes to setting deadlines, tracking milestones, and general planning.
Relentlessly optimistic, the SDD project manager reacts to a missed milestone by reporting to his superiors that everything is a-OK because the team will just make it up by the time they hit the next one. When the next milestone is missed by an even wider margin, same logic applies. Like a shopping addict or degenerate gambler blithely saying, “you gotta spend money to make money,” this project manager will continue to assume on-time delivery right up until the final deadline passes with no end in site. When that happens, it’s no big deal–they just need a week to tie up a few loose ends. When that week is up, it’ll just be one more week to tie up a few loose ends. When that week expires, they face reality. No, just kidding. It’ll just be one more week to tie up a few loose ends. After enough time goes by, members of the team humor him with indulgent baby talk when he says this: “sure it will, man, sure it will. In a week, everything will be great, this will all be behind us, and we’ll celebrate with steaks and lobster at the finest restaurant in town.”
Spoiler alert. At the end of Steinbeck’s novel, the idyllic rabbit farm exists only in the mind of one of the friends, shortly before he’s shot in the back of the head by the other, in an act that is part merciful euthanasia and part self-preservation. The corporate equivalent of this is what eventually happens to our project manager. Every week he insists that everything will be fine and that they’re pretty close to the promised land until someone puts the project out of its misery.
Shooting-Star-Driven Development (SSDD)
Steinbeck-Driven Development is not for everyone. It requires a healthy ability to live in deluded fantasy land (or, in the case of the novel, to be a half-wit). SSDD project managers are not the relentless optimists that their SDD counterparts are. In fact, they’re often pretty maudlin, having arrived at a PM post on a project that everyone knows is headed for failure and basically running out the clock until company bankruptcy or retirement or termination or something. These are the equivalents of gamblers that have exhausted their money and credit and are playing at the penny tables in the hopes that their last few bucks will take them on an unprecedented win streak. Or, perhaps more aptly, they’re like a lonely old toy-maker, sitting in his woodshop, hoping for a toy to come to life and keep them company.
This PM and his project are doomed to failure, so he rarely bothers with status meetings, creates a bare minimum of power points, and rarely ever talks about milestones. Even his Gantt charts have a maximum of three nested dependencies. It’s clear to all that he’s phoning it in. He knows it’s unlikely, but he pins his slim hope to a shooting star: maybe one of his developers will turn out to be the mythical 100x developer that single-handedly writes the customer information portal in the amount of time that someone, while struggling to keep a straight face, estimated it would take to do.
As projects go along and fall further and further behind schedule and the odds of a shooting star developer become more and more remote, the SSDD project manager increasingly withdraws. Eventually, he just kind of fades away. If Geppetto were a real life guy, carving puppets and asking stars to make them real children, he’d likely have punched out in an 19th century sanitarium. There are no happy endings on SSDD projects–just lifeless, wooden developers and missed deadlines.
Fear-Driven Development (FDD)
There is no great mystery to FDD projects. The fate of the business is in your hands, developers. Sorry if that’s a lot of pressure, but really, it’s in your hands.
The most important part of a FDD project is to make it clear that there will be consequences–dire consequences–to the business if the software isn’t delivered by such and such date. And, of course, dire consequences for the business are pretty darned likely to affect the software group. So, now that everyone knows what’s at stake, it’s time to go out and get that complex, multi-tiered, poorly-defined application built in the next month. Or else.
Unlike most waterfall projects, FDD enters the death march phase pretty much right from the start of coding. (Other waterfall projects typically only start the death march phase once the testing phase is cancelled and the inevitability of missing the deadline is clear.) The developers immediately enter a primal state of working fourteen hours per day because their very livelihoods hang in the balance. And, of course, fear definitely has the effect of getting them to work faster and harder than they otherwise would, but it also has the side effect of making the quality lower. Depending on the nature of the FDD project and the tolerance level of the customers for shoddy or non-functional software, this may be acceptable. But if it isn’t, time for more fear. Consequences become more dire, days become longer, and weekends are dedicated to the cause.
The weak have nervous breakdowns and quit, so only the strong survive to quit after the project ends.
Passive-Aggressive-Driven Development (PADD)
One of the most fun parts of waterfall development is the the estimation from ignorance that takes place during either the requirements or design days. This is where someone looks at a series of Visio diagrams and says, “I think this project will take 17,388.12 man-hours in the low risk scenario and 18,221.48 in the high-risk scenario.” The reason I describe this as fun is because it’s sort of like that game you play where everyone guesses the number of gumballs in a giant jar of gumballs and whoever is closest without going over wins a prize. For anything that’s liable to take longer than a week, estimation in a waterfall context is a ludicrous activity that basically amounts to making things up and trying to keep a straight face as you convince yourself and others that you did something besides picking a random number.
Well, I broke this task up into 3,422 tasks and estimated each of those, so if they each take four hours, and everything goes smoothly when we try to put them all together with an estimate for integration of… ha! Just kidding! My guess is 10,528 hours–ten because I was thinking that it’d have to be five digits, the fve because it’s been that many days that we’ve been looking at these Gantt charts and sequence diagrams, and twenty-eight because that was my number in junior high football. And you can’t bid one hour over me because I’m last to guess!
But PADD PMs suck all of the fun out of this style of estimation by pressuring the hours guessers (software developers) into retracting and claiming less time. But they don’t do it by showing anger–the aggression is indirect. When the developer says that task 1,024, writing the batch file import routine, will take approximately five hours, the PADD PM says, “Oh, wow. Must be pretty complicated. Jeez, I just assumed that a senior level developer could bang that out in no more than two. My bad.” Shamed, the developer retracts: “No, no–you’re right. I figured the EDI would be more complicated than it was, so I just realized that my estimate is actually two hours.”
Repeated in aggregate, the PADD PM is some kind of spectacular black belt/level 20/guru/whatever metric is used to measure PM productivity, because he just reduced the time to market by 60% before a single line of code was ever written. Amazing! Of course, talk at the beginning of the project is cheap. The real measure of waterfall project success is figuring out who to blame and getting others to absorb the cost when the project gets way behind schedule. And this is where the PADD master really shines.
To his bosses, he says, “man, I guess I just had too much faith in our guys–I mean, I know you hire the best.” To the developers, he says, “boy, your estimates seemed pretty reasonable to me, so I would have assumed that everything would be going on time if you were just putting in the hours and elbow grease… weird.” To the end-users/stakeholders, he says, “it’s strange, all of our other stakeholders who get us all of the requirements clearly and on time get their software on time–I wonder what happened here.”
There’s plenty of blame to go around, and PADD PMs make sure everyone partakes equally and is equally dissatisfied with the project.