DaedTech

Stories about Software

By

Making The Bad Impossible

Don’t Generate Compiler Warnings

I was doing some work this morning when I received a general email to a group of developers for a large project that sort of pleaded with and sort of demanded that members of the group not check in code that generated compiler warnings. And, I agree with that. Although, with the way that particular project is structured, this is easier said than done in some cases. There are dozens and dozens of assemblies in it, not all of which are re-compiled with a build, so someone may generate a warning, not notice it, and then not see it in subsequent runs if that assembly isn’t recompiled. Doing a rebuild instead of a build as habit is impractical here.

Something bothered me about this request and I couldn’t put my finger on it. I mean, the request is reasonable and correct. But, the email seemed sort of inappropriate in the way that talking about one’s salary or personal politics seems sort of inappropriate. It made me uncomfortable. The “why” of this kicked in after a few minutes. The email made me uncomfortable because it should have no need to exist. No email like that should ever be sent.

Don’t Do X

Some time back, I blogged about tribal knowledge, and why I view it as a bad thing. Tribal knowledge is a series of things that accrue over time in a project or group that are not manifest and can only be learned and memorized by experienced team members:

You ask for a code review, and a department hero takes one look at your code and freaks out at you. “You can never call MakeABar before MakeABaz!!! If you do that, the application will crash, the computer will probably turn off, and you might just flood the Eastern Seaboard!”

Dully alarmed, you make a note of this and underline it, vowing never to create Bars before Bazes. You’ve been humbled, and you don’t know why. Thank goodness the Keeper of The Tribal Knowledge was there to prevent a disaster. Maybe someday you’ll be the Keeper of The Tribal Knowledge.

We’ve all experienced things like this when it comes to code. Sometimes, they’re particulars of the application in question, such as in the tribal knowledge examples. Sometimes, they’re matters of organizational or other aesthetics, such as alphabetizing variable declarations, or maybe some other scheme. Sometimes, they’re coding conventions or standards in general. But, they’re always things that a person or group decide upon and then enforce by asking, telling, demanding, cajoling, threatening, begging, shaming, etc.

As a project or collaborative effort in general proceeds, the number of these items tends to increase, and the attendant maintenance effort increases along with it. One rule is simple and easy to remember. When the number gets up to five, people sometimes start to forget. As it balloons from there, team members become increasingly incapable of using a mental checklist and the checklist finds its way to a spreadsheet, or, perhaps a wall:

When this happens, the time spent conforming to it starts to add up. There is the development time, and then the post-development checklist validation time. Now, theoretically, a few times through the checklist and developers start tending to write code that conforms to it, so the lost time is, perhaps, somewhat mitigated, but the check itself is still more or less required. If someone gets very good at compliance, but makes one typo or careless error, a public shaming will occur and re-up the amount of time being spent on The List.

And, The List probably grows and changes over time, so no one is ever really comfortable with it. The List isn’t just tribal knowledge – it’s “meta-tribal knowledge”.

Stop the Madness

Is The List really necessary? People who enjoy dealing with the DMV or other bureaucracies may think so, and most developers may resign themselves to this sort of thing as an occupational hazard, but why are we doing this? Why do we have to remember to do a whole bunch of things and not do a whole bunch of other things. We’re programmers – can’t we automate this?!?

I believe that the answer to that question is “absolutely, and you should.” To circle back to my introductory example, rather than send out emails about checking for compiler warnings, why not set the build to treat warnings as errors? Then, nobody has to remember anything – generating a compiler warning will impede immediate progress and thus immediately be rectified. No emails, no post-it, and no entry in The List.

Alphabetized variables? Write a script on the build machine that parses your checked in files. Casing conventions? Write another parsing script or use something like DevExpress to automate it. Dependency management? Organize code into assemblies/libraries that make it impossible for developers to include the wrong namespaces or use the wrong classes.

And that’s the meat of the issue. Don’t tell people not to do bad things – make it impossible via immediate feedback. If I’m expected not to generate compiler warnings, make it so that I get the feedback of not being able to run my tests or application until I clean up my mess. That sort of scheme retains the learning and feedback paradigm, but without being a drain on productivity or the need for others to be involved. Constructor injection is based on this principle, when you really think about it. Using that injection scheme, I don’t need to wonder what I have to do before instantiating a Foo – the compiler tells me by not building if I don’t give Foo what it wants. I don’t have to go ask anyone. I don’t have to find out at a code review. I can just see it for myself and learn on my own.

Obviously, it’s not possible to make all bad things impossible. If you can do that, you probably have better things to do than code, like running the world. But, I would advocate that we go after the low-hanging fruit. This is really just an incarnation of the fail early concept, but that doesn’t just describe code behavior at runtime. It also can describe the feedback loop for development standards and preferences. Static analysis tools, auto-checkers, rules engines, etc are our friends. They automate code checking so that it need not be done by asking, telling, demanding, cajoling, threatening, begging, shaming, etc.

By

Offering Constructive Criticism without Breaking the Creative Spirit

Wet Behind the Ears

As a child in secondary school, you are instructed to do work and have very little choice in the whys and hows of the matter. This tends to be true in college education as well, although as one progresses, some choices are left to the student. Generally, these are vetted heavily by professors or teaching assistants. This trend generally continues right on into the business world, perhaps even with a bit of regression. New, entry-level hires are usually given pretty specific instructions and trusted very little – all work is checked heavily in the majority of cases.

This comes to be commonplace enough over the course of a young life that you don’t think much of it. I know that when I came into the work force, I took it as a matter of course that this would happen and that, in the majority of cases, the people vetting my work and giving me instructions knew what they were doing and I didn’t. I was right about this.

This isn’t to say I didn’t have the childhood tendency to be constantly convinced that I was right about things. The cliche of your parents knowing nothing when you’re a high school kid and getting a lot smarter somehow over the next 10 year was true for me. But, this was really only the case in my personal life – in business and academia, I took it as a matter of course that those with more experience than me knew more than me and that I could learn from then. Again, I was right about this.

The Bulb Comes On A Little

After a while, an interesting thing started to happen. I gained enough experience and confidence after a bit of time in the workforce to actually form my own opinions. Instead of always taking it as a matter of course that reviewers and managers were right about everything, I’d start to think to myself, “I’m not sure that’s the best way to do it.” But, I’d do it anyway, and often this would result in me learning how wrong I had been. But, every now and then, it would result in my suspicions being corroborated, and I discovered that being the junior team member, person whose code was being reviewed, employee, etc, didn’t mean, ipso facto, that I was wrong and they were right.

I tend to learn from my peers very quickly and soak up knowledge from them like a sponge. Perhaps this is stubbornness and I don’t like being wrong, so I burn the midnight oil to fix my weaknesses. Perhaps it’s just that I have a good work ethic and lots of natural curiosity. Perhaps I’m a perfectionist. I don’t really know, and I don’t think it much matters. End result of this is that the times when I was right but told to do the wrong thing grew more frequent over the course of time, and that became frustrating to me. I wanted the benefit of the input I was getting, when that input was right, but not to be saddled with it when it was wrong.

The Wrong Approach

A cynical trick that I learned pretty quickly was that meetings to evaluate proposals or review work were generally of a fixed length duration. If I brought in a beautiful, well thought out proposal, some of my precious items would be critiqued, and I’d have to compromise on them. But, if I brought in the same proposal with a few obvious mistakes, the duration of the meeting would be occupied by suggestions to ‘fix’ my red herrings, and my proposal/design/code would remain intact as I had originally envisioned it.

I’m ashamed to say that this worked exactly as I intended at the time. Meetings went exactly as I planned, and I proceeded without any serious review of my concepts. I had, effectively, swung from naive assumption that others were always right to an equally naive assumption that I needed no input from anyone else. In the end, I made some mistakes that I had to put in long hours fixing and realized later that it was entirely possible someone could have anticipated the blunder that I had missed. Now that I’m too old to know everything, I understand exactly how silly this was.

How Did It Come To This?

Reflecting back on my hubris years later, I understand something explicitly now that I understood only intuitively back then. What I’ve come to understand is that some people feel that offering feedback/criticism is obligatory in certain environments, whether or not feedback is necessary or helpful. That is, if someone were to present a light switch as a simple, effective solution for turning the overhead lights on and off, these sorts of people would suggest a push button switch or The Clapper, not because it was a better solution but simply because they wanted to put in their two cents.

I suspect that there are two main causes of this. The first is that some people very much enjoy expressing opinions, and all matters are subjects for debate. It’s a game of sorts. I have friends that have this attitude, and it can be a great one for driving debates and spurring innovation. It can also be tiresome at times, but it keeps you thinking and on your toes. The other cause I see is less innate and more a manifestation of collaborative environments – people perceive this as obligatory for subtle confirmation of status. Making corrections to the work of ‘inferiors’ reminds everyone present as to their place in the food chain.

Consider a meeting where a more senior peer or a manager reviews work of a less senior peer or subordinate. If the reviewer offers nothing except a “great job”, he may feel as if he’s acknowledging superiority and thus undermining his own role. The feedback thus becomes obligatory, an end rather than a mean, which creates a situation similar in concept to a scheme where police have quotas for traffic tickets given out — it’s assumed that something is wrong, and if nothing is wrong, something must be invented.

This is complicated somewhat by the fact that there is probably something that can be improved about any suggested process, piece of code, idea or anything else, just as it’s safe to assume that there’s always somebody speeding somewhere. As such, there is always a justification for suggestions for improvement or constructive criticism. But, the fact that just about anything could be improved does not necessarily imply that these obligatory suggestions amount to actual improvements. Often, they’re simply time wasters and can even diminish the idea. This is because if the criticism is obligatory, it may not be well thought out or justified – the criticizer may be grasping at straws to assert his authority.

I believe that there are some “tells” as to when this is occurring. If a 10 page document is submitted, and the reviewer offers 2 or 3 criticisms of the first page and then says the rest is fine, this is probably an obligatory criticism. What most likely happened is that the reviewer read until he or she found a few things to express opinions about and then stopped, since the mission of reinforcing authority was accomplished. Another such tell is criticism that misses the point or some obvious fact, for obvious reasons. And then, there is vague or ad hominem criticism — suggestions that the presenter is not up to the task or hasn’t been around long enough to understand things.

To go back to my narrative, I began to see this in action intuitively, pick up on the tells, and manipulate the situation to my advantage. Perfunctory criticism can be ferreted out by inserting a few false flag mistakes early on and allowing them to be corrected so that the idea can be discussed in earnest while placating those seeking reassurance.

So What?

Having read about my own story (and probably either empathizing, as a young developer or remembering back, as a more experienced one), it is worth asking if this is a problem, or if it is simply the way of the world. To that, my answer is “both”. The practice might have some nominal use in keeping people honest and humble, but it has a side effect that outweighs that benefit, in my opinion. It promotes a culture of “tenure over merit” and amplifies the “Dead Sea Effect”, wherein talented new developers tend to leave a company quickly and less enthusiastic and ambitious developers stay around and grandfather into roles of authority.

Alex Papadimoulis also blogged about this effect and said

The reason that skilled employees quit, however, is a bit more complicated. In virtually every job, there is a peak in the overall value (the ratio of productivity to cost) that an employee brings to his company. I call this the Value Apex.

On the first minute of the first day, an employee’s value is effectively zero. As that employee becomes acquainted with his new environment and begins to apply his skills and past experiences, his value quickly grows. This growth continues exponentially while the employee masters the business domain and shares his ideas with coworkers and management.

However, once an employee shares all of his external knowledge, learns all that there is to know about the business, and applies all of his past experiences, the growth stops. That employee, in that particular job, has become all that he can be. He has reached the value apex.

If that employee continues to work in the same job, his value will start to decline. What was once “fresh new ideas that we can’t implement today” become “the same old boring suggestions that we’re never going to do”. Prior solutions to similar problems are greeted with “yeah, we worked on that project, too” or simply dismissed as “that was five years ago, and we’ve all heard the story.” This leads towards a loss of self actualization which ends up chipping away at motivation.

Skilled developers understand this. Crossing the value apex often triggers an innate “probably time for me to move on” feeling and, after a while, leads towards inevitable resentment and an overall dislike of the job. Nothing – not even a team of on-site masseuses – can assuage this loss.

Bruce Webster talks about the role of a frustrating, apparently external, management structure in bringing this phenomenon about and Alex talks about the sense of no longer being able to offer value. Obligatory reviewers short circuit and amplify the Dead Sea Effect by making the frustrating management internal to the group and by setting the “value apex” low right from the start. Talented new hires are more likely to be quickly discouraged with obligatory criticism being directed toward them to reinforce status that reviewers have and they don’t.

What to Do?

I’ve spent a considerable number of words explaining a problem as I see it and an ill-advised ‘solution’ I came up with years back and subsequently discarded, so I’ll offer some thoughts on solutions to this problem. For the shorter tenured, presenter/review-ee worker, there is no entirely self-deterministic solution, but you can improve your odds of success and minimize frustration. The most important thing to do, in my opinion, and the solution that I’ve ultimately adopted is to be extremely well organized and prepared. As you work on your proposal/design/code/etc, document not only what you do, but what other approaches you considered and discarded. If you’re writing actual code, make a list of each class and method and write down its purpose for existence. Organize and format these notes and email them out ahead of the meeting or bring them with you. The effect that this has is to counteract the obligatory objections by showing that you’ve considered and dismissed them as solutions. Obligatory, as opposed to serious or well-reasoned, objections tend to be off the cuff and thus likely to be the same thing that you’ve considered and dismissed, so having an immediate, well-defended answer serves not only to move the review along more quickly to relevant matters, but also to discourage status-seeking reviewers from testing these waters in the future. After all, making obligatory suggestions for modification only works to reinforce status if the attendees of the meeting perceive the suggester as right. Getting burned by a few volleys knocked quickly and decisively back into his court will make the status-seeker lose interest in this as an opportunity for grandstanding.

This practice has other indirect benefits as well. Getting in the habit of always justifying one’s decisions tends to make him better at what he does. If you’re mentally justifying every class and method that you write, I would argue that you’re a lot more likely to have a good design. And, your preparation makes others more effective as well. Thoughtful reviewers will be able to focus during the allotted time on things you hadn’t considered already and make the meeting more productive, and even the status-seekers will at least learn to come more prepared and probe deeper, if only for the purpose of not being shown up (in their minds).

If you’re running a team or a department and have the authority to dictate review policy, I would suggest a few different ideas. First and foremost, I would suggest offering a forum where more junior team members can present without any feedback or criticism at all. This need not be on a production piece of software – but given them some forum to present and be proud of their work without constantly needing to defend it against criticism. People feel pride in their work, for the most part, and getting a chance to showcase that work and feel good about it is important for team morale.

Another strategy would be to have a pool of reviewers and to let the team members choose who conducts the reviews. Keeping track of who they choose provides valuable feedback. I’d wager dollars to donuts that the status-seeker reviewers are rarely, if ever chosen, with developers selecting instead those from whom they can learn and who will make their work better. Of course, they might choose a reviewer who rubber stamps everything, but even that is valuable information to have, as you can talk to that reviewer or else choose a different reviewer.

A third option would be to create some sort of appeals process involving proof of concepts or other forms of ‘proof’ of ideas. If a developer submits a solution using a design pattern, and the reviewer happens not to like that pattern, don’t let that be the end of the discussion. The developer should be given a chance to create a demo or POC to show, concretely, the benefits of using the pattern, or perhaps a chance to cite sources or examples. This serves to create more of a meritocracy than a dictatorship when it comes to ideas.

Is It Worth the Bother?

In the end, one might make a good case that the new developers are wrong frequently enough that stunting their enthusiasm might be good for them, particularly if they’re cocksure about everything. A bite of humble pie is good for everyone from time to time. However, with that line of argument, a fine line emerges between appropriate, periodic humbling and the systematic stifling of new ideas and jockeying for influence. The more that obligatory criticisms and decisions by fiat take hold, the more they are reinforced, to the detriment of the department. I think that it’s of vital important to eliminate contributors to this type of environment to have an efficient team staffed with self-reliant and capable developers, whatever their experience levels may be.

(Clapper image was lifted from this blog post. I chose it as the clapper image source above all the other ones that pop on a google search specifically because I’m very interested in the subject of home automation)

By

Resumes and Alphabet Soup

I was reading a blog post by Scott Hanselman this morning. The main premise was a bemused lampooning of the ubiquitous tendency of techies to create a resume “alphabet soup” and a suggestion that Stack Overflow Careers is introducing a better alternative.

I was going to comment on this post inline, but I discovered that I had enough to say to create my own post about it. It seems as though this state of affairs for techies is, ironically, the product of living in a world created by techies. What I mean is that over the last several decades, we have made it our life’s goal to automate processes and improve efficiency. Those chickens have come home to roost in the form of automated resume searching, processing, and recommending.

Alphabet Soup Belongs in a Bowl

Having an item at the bottom of a resume that reads “XML, HTML, JSON, PHP, XSLT, HTTP, WPF, WCP, CSS… etc” is actually quite efficient toward its purpose in the same way that sticking a bunch of keywords in a web page’s hidden meta-data is efficient. It violates the spirit of the activity by virtue of the fact that it’s so good at gaming it as to be “too good”. As a job seeker, if I want to create the largest opportunity pool, it would stand to reason that I should include every three and four character combination of letters in existence somewhere in the text of my resume (“Strongly proficient in AAA, AAB, AAC, …ZZZZ”). And, while most of these ‘technologies’ don’t exist and I probably haven’t used most of the ones that do, this will cast a wider net for my search than not including this alphabet soup. I can always sort out the details later once my foot is in the door. Or, so the thinking probably goes (I’m not actually endorsing, in any way, resume exaggeration or the SPAMing the resume machine — just using first person to illustrate a point).

We, the techies of the world, have automated the process of matching employers with employees, and now, we are forced to live in a world where people attempt to game the system in order to get the edge. So, what’s the answer? A declaration that companies should stop hiring this way? Maybe, but that seems unlikely. A declaration that people should stop creating their resumes this way because it’s silly? That seems even more unlikely because the ‘silly’ thing is not doing something that works, and the alphabet soup SPAM works.

I think that this programmer-created problem ought to be solved with better programming. What we’ve got here is a simple text matching algorithm. As a hiring authority, I program my engine (or have the headhunters and sites program it for me) to do a procedure like “give me all the resumes that contain XML, HTML, CSS, and WPF”. I then get a stack of matching resumes from people whose experience in those technologies may range from “expert” to “I think I read about that once on some website” and it’s up to me to figure out which resume belongs to which experience level, generally via one or more interviews.

So, maybe the solution here is to create a search algorithm that does better work. If I were gathering requirements for such a thing, I would consider that I have two stakeholders: company and employee. These two stakeholders share a common goal, “finding a good match”, and also have some mildly conflicting goals — company wants lots of applicants and few other companies and employee wants lots of companies and few other prospective employees. It is also worth considering that the stakeholders may attempt to deceive one another in pursuit of the goal (resume padding on one end or misrepresenting the position on the other end).

With that (oversimplified) goal enumeration in mind, I see the following design goals:

  1. Accuracy for employers. The search engine returns candidates that are better than average at meeting needs.
  2. Accuracy for employees. Engine does not pair them with employers looking for something different than them, thus putting them in position to fail interviews and waste time.
  3. Easy to use, narrative inputs for employees. You type up a summary of your career, interests, experience, projects, etc, and that serves as input to the algorithm – you are not reduced to a series of acronyms.
  4. Easy to use, narrative inputs for employers. You type up a list of the job’s duties at present, anticipated duties in the future, career development path, success criteria, etc and that serves as input to the algorithm.
  5. Opacity/Double blind. Neither side understands transparently the results from its inputs. Putting the text “XML” on a resume has an indeterminate effect the likelihood of getting a job with an employer that wants employees with XML knowledge. This mitigates ‘gaming’ of the system (similar in concept to how search engines guard their algorithms)

Now, in between the narrative input of both sides, the magic happens and pairings are made. That is where we as the techies come in. This is an ambitious project and not an easy one, but I think it can and should happen. Prospective employees tell a story about their career and prospective employers tell a story about a hypothetical employee and matches are created. Nowhere in this does a dumb straight-match of acronym to acronym occur, though the system would take into account needs and skills (i.e a position primarily developing in C++ would yield candidates primarily with good C++ background).

Anyway, that’s just what occurred to me in considering the subject, and it’s clearly far too long for a blog comment. I’m spitballing this here, so comments and opinions are welcome. Also, if this already exists somewhere, please point me at it, as I’d be curious to see it.

(Alphabet soup photo is linked from this post which, by the way, I fully agree with.)

By

A Better Metric than Code Coverage

My Chase of Code Coverage

Perhaps it’s because fall is upon us and this is the first year in a while that I haven’t been enrolled in a Master’s of CS program (I graduated in May), I’m feeling a little academic. As I mentioned in my last post, I’ve been plowing through following TDD by the letter, and if nothing else, I’m pleased that my code coverage is more effortlessly at 100%. I try to keep my code coverage around 100% whether or not I do TDD, so the main difference I’ve noticed is that TDD versus retrofitted tests seems to hit my use cases a lot harder, instead of just going through the code at least once.

Now, it’s important to me to get close to or hit that 100% mark, because I know that I’m at least touching everything going into production, meaning that I don’t have anything that would blow up if the stack pointer ever got to it, and I’m saved only by another bug preventing it from executing. But, there is a difference between covering code and exercising it.

More than 100% Code Coverage?

As I was contemplating this last night, I realized that some lines of my TDD code, especially control flow statements, were really getting pounded. There are lines in there that are covered by dozens of tests. So, the first flicker of an idea popped into my head — what if there were two factors at play when contemplating coverage: LOC Covered/Total LOC (i.e. our current code coverage metric) and Covering tests/LOC (I’ll call this coverage density).

High coverage is a breadth-oriented thing, while high density is depth — casting a wide net versus a narrow one deeply. And so, the ultimate solution would be to cast a wide net, deeply (assuming unlimited development time and lack of design constraints).

Are We Just Shifting the Goalposts?

So, Code Density sounded like sort of a heady concept, and I thought I might be onto something until I realized that this suffered the same potential for false positive feedback as code coverage. Specifically, I could achieve an extremely high density by making 50 copies of all of my unit tests. All of my LOC would get hit a lot more but my test suite would be no better (in fact, it’d be worse since it’s now clearly less efficient). So code coverage is weaker as a metric when you cheat by having weak asserts, and density is weaker when you cheat by hitting the same code with identical (or near identical) asserts.

Is there a way to use these two metrics in combination without the potential for cheating? It’s an interesting question and it’s easy enough to see that “higher is better” for both is generally, but not always true, and can be perverted by developers working under some kind of management edict demanding X coverage or, now, Y density.

Stepping Back a Bit

Well, it seems that Density is really no better than Code Coverage, and it’s arguably more obtuse, or at least it has the potential to be more obtuse, so maybe that’s not the route to go. After all, what we’re really after here is how many times a line of code is hit in a different scenario. For instance, hitting the line double result = x/y is only interesting when y is zero. If I hit it 45,000 times and achieve high density, I might as well just hit it once unless I try y at zero.

Now, we have something interesting. This isn’t a control flow statement, so code coverage doesn’t tell the whole story. You can cover that line easily without generating the problematic condition. Density is a slightly (but not much) better metric. We’re really driving after program correctness here, but since that’s a bit of a difficult problem, what we’ll generally settle for is notable, or interesting scenarios.

A Look at Pex

Microsoft Research made a utility called Pex (which I’ve blogged about here). Pex is an automated test generation utility that “finds interesting input-output values of your methods”. What this means, in practice, is that Pex pokes through your code looking for edge cases and anything that might be considered ‘interesting’. Often, this means conditions that causes control flow branching, but it also means things like finding our “y” div by zero exception from earlier.

What Pex does when it finds these interesting paths is it auto-generates unit tests that you can add to your suite. Since it finds hard-to-find edge cases and specializes in branching through your code, it boasts a high degree of coverage. But, what I’d really be interested in seeing is the stats on how many interesting paths your test suite cover versus how many there are or may be (we’d likely need a good approximation as this problem quickly becomes computationally unfeasible to know for certain).

I’m thinking that this has the makings of an excellent metric. Forget code coverage or my erstwhile “Density” metric. At this point, you’re no longer hoping that your metric reflects something good — you’re relatively confident that it must. While this isn’t as good as some kind of formal method that proves your code, you can at least be confident that critical things are being exercised by your test suite – manual, automated or both. And, while you can achieve this to some degree by regularly using Pex, I don’t know that you can quantify it other than to say, “well, I ran Pex a whole bunch of times and it stopped finding new issues, so I think we’re good.” I’d like a real, numerical metric.

Anyway, perhaps that’s in the offing at some point. It’d certainly be nice to see, and I think it would be an advancement in the field of static analysis.