DaedTech

Stories about Software

By

Faking An Interface Mapping in Entity Framework

Entity Framework Backstory

In the last year or so, I’ve been using Entity Framework in the new web-based .NET solutions for which I am the architect. I’ve had some experience with different ORMs on different technology stacks, including some roll your own ones, but I have to say that Entity Framework and its integration with Linq in C# is pretty incredible. Configuration is minimal, and the choice of creating code, model or database first is certainly appealing.

One gripe that I’ve had, however, is that Entity Framework wants to bleed its way up past your infrastructure layer and permeate your application. The path of least resistance, by far, is to let Entity Framework generate entities and then use them throughout the application. This is often described, collectively, as “the model,” and it impedes your ability to layer the application. The most common form of this that I’ve seen is simply to have Entity Framework context and allow that to be accessed directly in controllers, code behind or view models. In a way, this is actually strangely reminiscent of an Active Record except that the in memory joins and navigation operations are a lot more sophisticated. But the overarching point is still the same — not a lot of separation of functionalities and avoidance of domain modeling in favor of letting the database ooze its way into your code.

CakeAndEatItToo

I’m not a fan of this approach, and I very much prefer applications that are layered or otherwise separated in terms of concerns. I like there to be a presentation layer for taking care of presenting information to the user, a service layer to serve as the application’s API (flexibility and for acceptance testing), a domain layer to handle business rules, and a data access layer to manage persistence (Entity Framework actually takes care of this layer as-is). I also like the concept of “persistence ignorance” where the rest of the application doesn’t have to concern itself where persistent data is stored — it could be SQL Server, Oracle, a file, a web service… whatever. This renders the persistence model and ancillary implementation detail which, in my opinion, is what it should be.

A way to accomplish this is to use the “Repository Pattern,” in which higher layers of the application are aware of a “repository” which is an abstraction that makes entities available. Where they come from to be available isn’t any concern of those layers — they’re just there when they’re needed. But in a lot of ways with Entity Framework, this is sort of pointless. After all, if you hide the EF-generated entities inside of a layer, you don’t get the nice query semantics. If you want the automation of Entity Framework and the goodness of converting Linq expressions to SQL queries, you’re stuck passing the EF entities around everywhere without abstraction. You’re stuck leaking EF throughout your code base… or are you?

Motivations

Here’s what I want. I want a service layer (and, of course, presentation layer components) that is in no way whatsoever aware of Entity Framework. In the project in question, we’re going to have infrastructure for serializing to files and calling out to web services, and we’re likely to do some database migration and using of NoSQL technologies. It is a given that we need multiple persistence models. I also want at least two different version of the DTOs: domain objects in the domain layer, hidden under the repositories, and model objects for binding in the presentation layer. In MVC, I’ll decorate the models with validation attributes and do other uniquely presentation layer things. In the WPF world, these things would implement INotifyPropertyChanged.

Now, it’s wildly inappropriate to do this to the things generated by EntityFramework and to have the domain layer (or any layer but the presentation layer) know about these presentation layer concepts: MVC Validation, WPF GUI events, etc. So this means that some kind of mapping from EF to models and vice-versa is a given. I also want rich domain objects in the domain layer for modeling business logic. So that means that I have two different representations of any entity in two different places, which is a classic case for polymorphism. The question then becomes “interface implementation or inheritance?” And I choose interface.

My reasoning here is certainly subject to critique, but I’m a fan of creating an assembly that contains nothing but interfaces and having all of my layer assemblies take a dependency on that. So, the service layer, for instance, knows nothing about the presentation layer, but the presentation layer also knows nothing about the service layer. Neither one has an assembly reference to the other. A glaring exception to this is DTOs (and, well, custom exceptions). I can live with the exceptions, but if I can eliminate the passing around of vacuous property bags, then why not? Favoring interfaces also helps with some of the weirdness of having classes in various layers inherit from things generated by Entity Framework, which seems conspicuously fragile. If I want to decorate properties with attributes for validation and binding, I have to use EF to make sure that these things are virtual and then make sure to override them in the derived classes. Interfaces it is.

Get Ur Dun!

So that’s great. I’ve decided that I’m going to use ICustomer instead of Customer throughout my application, auto-generating domain objects that implement an interface. That interface will be generated automatically and used by the rest of the application, including with full-blown query expression support that gets translated into SQL. The only issue with this plan is that every google search that I did seemed to suggest this was impossible or at least implausible. EF doesn’t support that, Erik, so give it up. Well, I’m nothing if not inappropriately stubborn when it comes to bending projects to my will. So here’s what I did to make this happen.

I have three projects: Poc.Console, Poc.Domain, and Poc.Types. In Domain, I pointed an EDMX at my database and let ‘er rip, generating the T4 for the entities and also the context. I then copied the Entity T4 template to the types assembly, where I modified it. In types, I added an “I” to the name of the class, changed it to be an interface instead of a class, removed all constructor logic, removed all complex properties and navigation properties, and removed all visibilities. In the domain, I modified the entities to get rid of complex/navigation properties and added an implementation of the interface of the same name. So at this point, all Foo entities now implement an identical IFoo interface. I made sure to leave Foo as a partial because these things will become my domain objects.

With this building, I wrote a quick repository POC. To do this, I installed the nuget package for System.Dynamic.Linq, which is a really cool utility that lets you turn arbitrary strings into Linq query expressions. Here’s the repository implementation:

public class Repository<TEntity, TInterface> where TEntity : class, new() where TInterface : class
{
    private PlaypenDatabaseEntities _context;

    /// 

    /// Initializes a new instance of the Repository class.
    ///

/// public Repository(PlaypenDatabaseEntities context) { _context = context; } public IEnumerable Get(Expression<Func<TInterface, bool>> predicate = null) { IQueryable entities = _context.Set(); if (predicate != null) { var predicateAsString = predicate.Body.ToString(); var parameterName = predicate.Parameters.First().ToString(); var parameter = Expression.Parameter(typeof(TInterface), predicate.Parameters.First().ToString()); string stringForParseLambda = predicateAsString.Replace(parameterName + “.”, string.Empty).Replace(“AndAlso”, “&&”).Replace(“OrElse”, “||”); var newExpression = System.Linq.Dynamic.DynamicExpression.ParseLambda<TEntity, bool>(stringForParseLambda, new[] { parameter }); entities = entities.Where(newExpression); } foreach (var entity in entities) yield return entity as TInterface; } }

Here’s the gist of what’s going on. I take an expression of IFoo and turn it into a string. I then figure out the parameter’s name so that I can strip it out of the string, since this is the form that will make ParseLambda happy. Along these same lines, I also need to replace “AndAlso” and “OrElse” with “&&” and “||” respectively. The former format is the how expressions are compiled, but ParseLambda looks for the more traditional expression combiners. Once it’s in a pleasing form, I parse it as a lambda, but with type Foo instead of IFoo. That becomes the expression that EF will use. I then query EF and cast the results back to IFoos.

Now, I’ve previously blogged that casting is a failure of polymorphism. And this is like casting on steroids and some hallucinogenic drugs for good measure. I’m not saying, “I have something the compiler thinks is an IFoo but I know is a Foo,” but rather, “I have what the compiler thinks of as a non-compiled code scheme for finding IFoos, but I’m going to mash that into a non-compiled scheme for finding Foos in a database, force it to happen and hope for the best.” I’d be pretty alarmed if not for the fact that I was generating interface and implementation at the same time, and that if I define some other implementation to pass in, it must have any and all properties that Entity Framework is going to want.

This is a proof of concept, and I haven’t lived with it yet. But I’m certainly going to try it out and possibly follow up with how it goes. If you’re like me and were searching for the Holy Grail of how to have DbSet<IFoo> or how to use interfaces instead of POCOs with EF, hopefully this helps you. If you want to see the T4 templates, drop a comment and I’ll put up a gist on github.

One last thing to note is that I’ve only tried this with a handful of lambda expressions for querying, so it’s entirely possible that I’ll need to do more tweaking for some edge case scenarios. I’ve tried this with a handful of permutations of conjunction, disjunction, negation and numerical expressions, but what I’ve done is hardly exhaustive.

Happy abstracting!

Edit: Below are the gists:

  • Entities.Context.tt for the context. Modified to have DBSets of IWhatever instead of Whatever.
  • Entities.tt for the actual, rich domain objects that reside beneath the repositories. These guys have to implement IWhatever.
  • DTO.tt contains the actual interfaces. I modified this T4 template not to generate navigation properties at all because I don’t want that kind of rich relationship definition part of an interface between layers.
By the way, if you liked this post and you're new here, check out this page as a good place to start for more content that you might enjoy.

By

My Initial Postsharp Setup — Logging at Assembly Granularity

PostSharp and Log4Net

I tweeted a bit about this, and now I’m going to post about my initial experience with it. This comes from the perspective of someone new at this particular setup but familiar with the concepts in general. So, what are PostSharp and log4net?

First up, log4net. This is extremely easy to understand, as it’s a logging framework for .NET. It was ported to .NET from Java and the product log4j a long time ago. It’s a tried and true way to instrument your applications for logging. I’m not going to go into a lot of detail on it (here’s a Pluralsight course by Jim Christopher), but you can install it for your project with Nuget and get started pretty quickly and easily.

PostSharp is a little more complex to explain (there’s also a Pluralsight course on this, by Donald Belcham). It’s a tool for a technique called “aspect-oriented programming” (AOP) which addresses what are known as cross cutting concerns. These are things that are intrinsically non-localized in an application. What I mean is, you might have a module that processes EDI feeds and another one that stores data to a local file, and these modules may be completely isolated from one another in your system. These concerns are localized in a nicely modular architecture. Something like, oh, I dunno, logging, is not. You do that everywhere. Logging is said to be an aspect of your system. Security is another stock example of an aspect.

PostSharp employs a technique called “IL Weaving” to address AOP in clean and remarkably decoupled way. If you’re a .NET programmer, whether you code in VB, C#, F#, etc., all of your code gets compiled down to what’s known as intermediate language (IL). Then, when the code is actually being executed, this IL is translated on the fly into machine/executable code. So there are two stages of compiling, in essence. In theory, you can write IL code directly. PostSharp takes advantage of this fact, and when you’re building your C# code into IL code, it interposes and injects a bit of its own stuff into the resultant IL. The upshot of all this is that you can have logging in every method in your code base without writing a single call to Logger.Log(something) in any method, anywhere. Let me be clear — you can get all of the benefits of comprehensive logging with none of the boilerplate, clutter, and intensely high coupling that typically comes with implementing an aspect.

Great, But How?

Due to a lack of time in general, I’ve sort of gotten away from detailed how-to posts, for the most part, with screenshots and steps. It’s really time consuming to make posts like that. What I’ll do instead is describe the process and, if anyone has questions, perhaps clarify with an addendum or links or something. Trying to get more agile everywhere and avoid gold-plating 🙂

And really, getting these things into your project is quite simple. In both cases, I just added a nuget package to a project. For log4net, this is trivial to do. For PostSharp, this actually triggers an install of PostSharp as a Visual Studio plugin. PostSharp offers a few different license types. When you install it in VS, it will prompt you to enter a license key or do a 45 day trial. You can sign up for an express version on their site, and you’ll get a license key that you can plug in. From there, it gets installed, and it’s actually really polished. It even gives you a window in Studio that keeps track of progress in some tutorials they offer for getting started.

With that in place, you’re ready to write your first aspect. These are generally implemented as attributes that you can use to decorate methods, types, and assemblies so that you can be as granular with the aspects as you like. If you implement an attribute that inherits from OnMethodBoundaryAspect, you get a hook in to having code executed on events in the application like “Method Enter,” “Method Leave,” and “Exception.” So you can write C# code that will get executed upon entry to every method.

Here’s a look at an example with some method details elided:

[Serializable]
[AttributeUsage(AttributeTargets.Assembly)]
public sealed class LogAttribute : OnMethodBoundaryAspect
{
    private static readonly ILog _logger;

    static LogAttribute()
    {
        SetupLogger();
        _logger = LogManager.GetLogger(typeof(LogAttribute));
    }

    public override void OnException(MethodExecutionArgs args)
    {
        if(_logger != null)
            _logger.Error("An exception occurred: ", args.Exception); 
    }
...

Leaving aside the logging implementation details, what I’ve done here is define an attribute. Any type or method decorated with this attribute will automatically log any exception that occurred without the code of that method being altered in the slightest. The “MethodExecutionArgs” parameter gives you information that lets you inspect various relevant details about the method in question: its name, its parameters, its return value, etc.

Getting Modular

Okay, so great. We can apply this at various levels. I decided that I wanted to apply it per assembly. I’m currently working at times in a legacy code base where a series of Winforms and Webforms applications make use of a common assembly called “Library.” This code had previously been duplicated, but I made it common and unified it as a step toward architecture improvement. This is where I put my aspect attribute for reference, and I decided to apply this at the assembly level. Initially, I want some assemblies logging exceptions, but not others. To achieve this, I put the following in the AssemblyInfo.cs in the assemblies for which I wanted logging.

[assembly: Log()]

This is awesome because even though PostSharp and the Aspect are heavily coupled to the assemblies on the whole (every assembly uses Library, and Library depends on Postsharp, so every assembly depends on PostSharp) it isn’t coupled in the actual code. In fact, I could just remove that line of code and the library dependency, and not touch a single other thing (except, of course, the references to library utilities).

But now another interesting problem arises, which is naming the log files generated. I want them to go in AppData, but I want them named after the respective deliverable in this code base.

And then, in the library project, I have this method inside of the LogAttribute class:

private static string GetLogFileFullPath()
{
    string friendlyName = AppDomain.CurrentDomain.FriendlyName;
    string executableName = friendlyName.Replace(".vshost", string.Empty);
    string appdataPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
    string logPath = Path.Combine(appdataPath, "{CompanyNameHere}");
    return Path.Combine(logPath, String.Format("{0}.log", executableName));
}

I’ve made use of the Monostate Pattern to ensure that a single logger instance is configured and initialized and then used by the attribute instances. This is an implementation that I’ll probably refine over time, but it’s alright for a skunkworks. So, what happens is that when the application fires up, I figure out the name of the entry executable and use it to name the log file that’s created/appended in AppData under the company name folder.

This was great until I noticed weird files getting created in that folder. Turns out that NCrunch and other plugins are triggering the code to be invoked in this way, meaning that unit test runners, realtime and on-demand are generating logging. Duh. Oops. And… yikes!

My first thought was that I’d see if I was being run from a unit test and no-op out of logging if that were the case. I found this stack overflow post where Jon Skeet suggested an approach and mentioned that he “[held his] nose” while doing it because it was a pragmatic solution to his problem. Well, since I wasn’t in a pinch, I decided against that.

Maybe it would make sense, instead of figuring out whether I was in a unit test assembly and what other sorts of things I didn’t want to have the logging turned on for, to take a whitelist approach. That way, I have to turn logging on explicitly if I want it to happen. I liked that, but it seemed a little clunky. I thought about what I’d do to enable it on another one of the projects in the solution, and that would be to go into the assembly file and add the attribute for the assembly, and then go into the logger to add the assembly to the whitelist. But why do two steps when I could do one?

private static bool IsAspectLoggingEnabled()
{
    try
    {
        return Assembly.GetEntryAssembly() != null && Attribute.GetCustomAttributes(typeof(LogAttribute), false).Any();
    }
    catch
    { return false; }
}

I added this method that actually figures out whether the attribute has been declared for the assembly and, I only enable the logger if it has. I’ve tested this out and it works pretty well, though I’ve only been living with it for a couple of days, so it’s likely to continue evolving. But the spurious log files are gone, and MS Test runner no longer randomly bombs out because the “friendly name” sometimes has a colon in it. This is almost certainly not the most elegant approach to my situation, but it’s iteratively more elegant, and that’s really I’m ever going for.

Ideas/suggestions/shared experience is welcome. And here’s the code for the aspect in its entirety right now:

[Serializable]
[AttributeUsage(AttributeTargets.Assembly)]
public sealed class LogAttribute : OnMethodBoundaryAspect
{
    private static readonly ILog _logger;

    static LogAttribute()
    {
        if (IsAspectLoggingEnabled())
        {
            SetupLogger();
            _logger = LogManager.GetLogger(typeof(LogAttribute));
        }
    }

    public override void OnException(MethodExecutionArgs args)
    {
        if(_logger != null)
            _logger.Error("An exception occurred: ", args.Exception); 
    }


    private static bool IsAspectLoggingEnabled()
    {
        try
        {
            return Assembly.GetEntryAssembly() != null && Attribute.GetCustomAttributes(typeof(LogAttribute), false).Any();
        }
        catch
        { return false; }
    }

    private static void SetupLogger()
    {
        var appender = BuildAppender();

        var hierarchy = (Hierarchy)LogManager.GetRepository();
        hierarchy.Root.AddAppender(appender);

        hierarchy.Configured = true;
        BasicConfigurator.Configure(appender);
    }

    private static FileAppender BuildAppender()
    {
        var appender = new RollingFileAppender()
        {
            File = GetLogFileFullPath(),
            AppendToFile = true,
            Layout = new PatternLayout() { ConversionPattern = "%m%n" }
        };
        appender.ActivateOptions();
        return appender;
    }

    private static string GetLogFileFullPath()
    {
        string friendlyName = AppDomain.CurrentDomain.FriendlyName;
        string executableName = friendlyName.Replace(".vshost", string.Empty);
        string appdataPath = Environment.GetFolderPath(Environment.SpecialFolder.ApplicationData);
        string logPath = Path.Combine(appdataPath, "{CompanyNameHere}");
        return Path.Combine(logPath, String.Format("{0}.log", executableName));
    }
}

By

Introduction to Unit Testing Part 2: Let’s Write a Test

In the last post in this series, I covered the essentials of unit testing without diving into the topic in any real detail. The idea was to offer sort of a guerrilla guide to what unit testing is for those who don’t know but feel they should. Continuing on that path and generating material for my upcoming presentation, I’m going to continue with the introduction.

In the previous post, I talked about what unit testing is and isn’t, what its actual purpose is, and what some best practices are. This is like explaining the Grand Canyon to someone that isn’t familiar with it. You now know enough to say that it’s a big hole in the earth that provides some spectacular views and that you can hike down into it, seeing incredible shifts in flora and fauna as you go. You can probably convince someone you’ve been there in a casual conversation, even without having seen it. But, you won’t really get it until you’re standing there, speechless. With unit testing, you won’t really get it until you do it and get some benefit out of it.

So, Let’s Write that Test

Let’s say that we have a class called PrimeFinder. (Anyone who has watched my Pluralsight course will probably recognize that I’m recycling the example class from there.) This class’s job is to determine whether or not numbers are prime, and it looks like this:

public class PrimeFinder
{
    public bool IsPrime(int possiblePrime)
    {
        return possiblePrime != 1 && !Enumerable.Range(2, (int)Math.Sqrt(possiblePrime) - 1).Any(i => possiblePrime % i == 0);
    }       
}

Wow, that’s pretty dense looking code. If we take the method at face value, it should tell us whether a number is prime or not. Do we believe the method? Do we have any way of knowing that it works reliably, apart from running an entire application, finding the part that uses it, and poking at it to see if anything blows up? Probably not, if this is your code and you needed my last post to understand what a unit test was. But this is a pretty simple method in a pretty simple class. Doesn’t it seem like there ought to be a way to make sure it works?

I know what you’re thinking. You have a scratchpad and you copy and paste code into it when you want to experiment and see how things work. Fine and good, but that’s a throw-away effort that means nothing. It might even mislead when your production code starts changing. And checking might not be possible if you have a lot of dependencies that come along for the ride.

But never fear. We can write a unit test. Now, you aren’t going to write a unit test just anywhere. In Visual Studio, what you want to do is create a unit test project and have it refer to your code. So if the PrimeFinder class is in a project called Daedtech.Production, you would create a new unit test project called DaedTech.Production.Test and add a project reference to Daedtech.Production. (In Java, the convention isn’t quite as cut and dry, but I’m going to stick with .NET since that’s my audience for my talk). You want to keep your tests out of your production code so that you can deploy without also deploying a bunch of unit test code.

Once the test class is in place, you write something like this, keeping in mind the “Arrange, Act, Assert” paradigm described in my previous post:

[TestMethod]
public void Returns_False_For_One()
{
    var primeFinder = new PrimeFinder(); //Arrange

    bool result = primeFinder.IsPrime(1); //Act

    Assert.IsFalse(result); //Assert
}

The TestMethod attribute is something that I described in my last post. This tells the test runner that the method should be executed as a unit test. The rest of the method is pretty straightforward. The arranging is just declaring an instance of the class under test (commonly abbreviated CUT). Sometimes this will be multiple statements if your CUTs are more complex and require state manipulation prior to what you’re testing. The acting is where we test to see what the method returns when we pass it a value of 1. The asserting is the Assert.IsFalse() line where we instruct the unit test runner that a value of false for result means the test should pass, but true means that it should fail since 1 is not prime.

Now, we can run this unit test and see what happens. If it passes, that means that it’s working correctly, at least for the case of 1. Maybe once we’re convinced of that, we can write a series of unit tests for a smattering of other cases in order to convince ourselves that this code works. And here’s the best part: when you’re done exploring the code with your unit tests to see what it does and convince yourself that it works (or perhaps you find a bug during your testing and fix the code), you can check the unit tests into source control and run them whenever you want to make sure the class is still working.

Why would you do that? Well, might be that you or someone else later starts playing around with the implementation of IsPrime(). Maybe you want to make it faster. Maybe you realize it doesn’t handle negative numbers properly and aim to correct it. Maybe you realize that method is written in a way that’s clear as mud and you want to refactor toward readability. Whatever the case may be, you now have a safety net. No matter what happens, 1 will never be prime, so the unit test above will be good for as long as your production code is around–and longer. With this test, you’ve not only verified that your production code works now; you’ve also set the stage for making sure it works later.

Resist the Urge to Write Kitchen Sink Tests

kitchensink

When I talked about a smattering of tests, I bet you had an idea. I bet your idea was this:

[TestMethod]
public void Test_A_Bunch_Of_Primes()
{
    var primes = new List() { 2, 3, 5, 7, 11, 13, 17 };
    var primeFinder = new PrimeFinder();


    foreach(var prime in primes)
        Assert.IsTrue(primeFinder.IsPrime(prime));
}

After all, it’s wasteful and inefficient to write a method for each case that you want to test when you could write a loop and iterate more succinctly. It’ll run faster, and it’s more concise from a coding perspective. It has every advantage that you’ve learned about in your programming career. This must be good. Right?

Well, not so much, counterintuitive as that may seem. In the first place, when you’re running a bunch of unit tests, you’re generally going to see their result in a unit test runner grid that looks something like a spreadsheet. Or perhaps you’ll see it in a report. If when you’re looking at that, you see a failure next to “IsPrime_Returns_False_For_12” then you immediately know, at a glance, that something went wrong for the case of 12. If, instead, you see a failure for “Test_A_Bunch_Of_Primes”, you have no idea what happened without further investigation. Another problem with the looping approach is that you’re masking potential failures. In the method above, what information do you get if the method is wrong for both 2 and 17? Well, you just know that it failed for something. So you step through in the debugger, see that it failed for 2, fix that, and move on. But then you wind up right back there because there were actually two failures, though only one was being reported.

Unit test code is different from regular code in that you’re valuing clarity and the capture of intent and requirements over brevity and compactness. As you get comfortable with unit tests, you’ll start to give them titles that describe correct functionality of the system and you’ll use them as kind of a checklist for getting code right. It’s like your to-do list. If every box is checked, you feel pretty good. And you can put checkboxes next to statements like “Throws_Exception_When_Passed_A_Null_Value” but not next to “Test_Null”.

There are some very common things that new unit testers tend to do for a while before things click. Naming test methods things like “Test_01” and having dozens of asserts in them is very high on the list. This comes from heavily procedural thinking. You’ll need to break out of that to realize the benefit of unit testing, which is inherently modular because it requires a complete breakdown into components to work. If nothing else, remember that it’s, “Arrange, Act, Assert,” not, “Arrange, Act, Assert, Act, Assert, Assert, Assert, Act, Act, Assert, Assert, Act, Assert, etc.”

Wrap-Up

The gist of this installment is that unit tests can be used to explore a system, understand what it does, and then guard to make sure the system continues to work as expected even when you are others are in it making changes later. This helps prevent unexpected side effects during later modification (i.e. regressions). We’ve also covered that unit tests are generally small, simple and focused, following the “Arrange, Act, Assert” pattern. No one unit test is expected to cover much ground–that’s why you build a large suite of them.

By

Why Are Public Fields No Good, Again?

Today, a developer on my team, relatively new to C# and some of the finer points of OO, asked me to take a look at his code. I saw this:

public class Address
{
    public string Street1;
    public string Street2;
    public string City;
    public string State;
    public string Zip;
}

How would you react to this in my shoes? I bet like I did–with a quick “ooh, no good…try this:”

public class Address
{
    public string Street1 { get; set; }
    public string Street2 { get; set; }
    public string City { get; set; }
    public string State { get; set; }
    public string Zip { get; set; }
}

Busy and pleased with myself for imparting this bit of wisdom, I started to move on. But then I realized that I was failing at mentoring. Nothing annoyed me more as a junior developer than hearing something like, “That’s just the way you do it.” And there I was saying it.

The creatures outside looked from pig to man, and from man to pig, and from pig to man again; but already it was impossible to say which was which.

AnimalFarm_1stEd

I stopped at this point and forced myself to talk about this–to come up with a reason. Obviously, I hadn’t been thinking about the subject of why public fields are icky, so slick terms like “encapsulation” and “breaking change” weren’t on the tip of my tongue. Instead, I found myself talking sort of obliquely and clumsily explaining encapsulation in terms of an example. Strangely, the first thing that popped into my head was threading and locking the field to make it thread-safe. So I said something along the lines of this:

With a public field, you simply have a variable that can be retrieved and set. With a property, you’re really hiding that variable behind a little method, and you have the ability to centralize operations on that variable as needed rather than forcing them on everyone who uses your code. So imagine that you want to make these fields thread-safe. In your version, you force all client code that uses your class to deal with this problem. In the property version, you can handle that for your clients so that they needn’t bother.

Yeah, things are always more awkward off the cuff. So I ruminated on this a bit and thought I’d offer it up in blog post format. Why not have public fields (aside from random thoughts about threading)?

The Basic Consideration: Encapsulation

As amused by my off-the-cuff threading example as I am, it does drive at a fundamental tenet of OOP, which is class-level encapsulation. If I’m writing a class, I want to expose a set of public functionality (methods and properties) that describe what an instance of the class will do while hiding how it will do it. If the address class is in some other assembly and I don’t decompile or anything, as a client, I have no idea if the City property is auto-implemented, if it wraps a simple property, or if all kinds of magic happens behind the scenes.

I know (hopefully) that if I set the City property to some value and then read it back, I get that value. But beyond that, I dunno. Maybe it implements lazy loading from some persistence store. Maybe it tracks all of the things you’ve set City to throughout its entire lifetime. I don’t know, and I don’t want to know if it isn’t part of the public API.

If you use public fields, you can’t offer consumers of your code that blissful ignorance. They know, ipso facto. And, worse, if they want any of the cool stuff I mentioned, like lazy loading or tracking previous values, they have to implement it themselves. When you don’t encapsulate, you fail to provide any kind of useful abstraction to me. If all you’re giving me is a field-bag, what use do I have for an instance of your class in mine? I can just have my own string variables, thank you very much, and I will hide them from my clients.

The Intermediate Consideration: Breaking Changes

There’s a more subtle problem at play here as well. Specifically, public properties and public fields look and seem pretty similar, if not identical, but they’re really not. Under the hood, a property is a method. A field is, well, a field. If you have a property and you decide you want to change its internal representation (going from something complex to simple property or automatic property), no harm done. If you want to switch between a field and a property though, even the trivial switch I showed above, there be dragons–especially going from field to property.

It seems like a pretty obvious case of YAGNI to start out with a field and move to a property if you need it, but things start to go wrong. Weird and subtle things. And they go wrong on you when you least expect them. Maybe you’re using the public members of the Address class as part of some kind of reflection. Well, now things are broken because properties and fields are completely different here. Perhaps you have a field that you were using as an out parameter somewhere. Oops. And, perhaps most insidiously of all, just changing a field to a property will cause runtime exceptions in dependent assemblies unless recompiled.

Let me put that another, scarier way. Let’s say that I write something called Erik’s Awesome DLL and I distribute it to to the world, where it gains wide adoption. Let’s also say that I created a bunch of public fields that the world uses as part of my DLL’s API. And finally, let’s say that in vNext, I decide that I want some encapsulation after all. Maybe I want to implement thread-safety. I implement and publish vNext to Nuget, and you update accordingly. The next time you hit F5, your application will blow up, and it will continue to blow up until you recompile. It may not sound like a big deal, but you’re definitely going to annoy users of your code with stuff like this.

So beware that you’re casting the die here more quickly and strongly than you might think. In most cases, YAGNI applies, but in this case, I’d say that you should assume you are going to need it. (And anything that stops people from using out parameters is good in my book)

The Advanced Consideration: Tell, Don’t Ask

I’ll leave off with a more controversial and definitely the most philosophical point. Here’s some interesting reading from the Pragmatic Bookshelf, and the quote I’m most interested in here describes the concept of “Tell, Don’t Ask.”

[You] should endeavor to tell objects what you want them to do; do not ask them questions about their state, make a decision, and then tell them what to do.

I’m calling this controversial because the context in which I’m mentioning it here is a bit strained. Address, above, is essentially a data transfer object whose only purpose is to store data. It will probably be bubbled up from somewhere like a database and make its way onto something like a form, with perhaps no actual plumbing code that ever does ask it anything. But then again, maybe not, so I’m mentioning it here.

Public fields are the epitome of “ask.” There’s no telling but not asking–they’re all “ask and tell.” Auto-properties are hardly different, but they’re a slight and important step in the right direction. Because with a property, awkward as it may be, we could get rid of the getter and leave only the setter. This would be a step toward factoring to a (less awkward) implementation in which we had a method that was setting something on the object, and suddenly we’re telling only and asking nothing.

“Tell, don’t ask” is really not about setting data properties and never reading them (which is inherently useless), but about a shift in thinking. So in this example, we’d probably need one more step to get from telling Address what its various fields are only to telling some other object to do something with that information. This is getting a little indirect, so I won’t belabor the point. But suffice it to say that code riddled with public fields tends to be about as far from “tell, don’t ask” as you can get. It’s a system design in which one piece of code sets things so that another can read it rather than a system design in which components communicate via instructions and commands.

So I feel clean and refreshed now. I didn’t shove off with “I’m in charge and I say so” or “because that’s just a best practice.” I took time to construct a carefully reasoned argument, which served double duty of keeping me in the practice of justifying my designs and decisions and staying sharp with my understanding of language and design principles.

By

Exception Handling Basics

The other day, I was reviewing some code, and I saw a series of methods conforming to the following (anti) ‘pattern’

public class CustomerProcessor
{
    public void ProcessCustomer(Customer customer)
    {
        try
        {
            if (customer.IsActive)
                ProcessActiveCustomer(customer);
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

    private void ProcessActiveCustomer(Customer customer)
    {
        try
        {
            CheckCustomerName(customer);
            WriteCustomerToFile(customer);
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

    public void CheckCustomerName(Customer customer)
    {
        try
        {
            if (customer.Name == null)
                customer.Name = string.Empty;
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }

    private void WriteCustomerToFile(Customer customer)
    {
        try
        {
            using (StreamWriter writer = new StreamWriter(@"C:\temp\customer.txt"))
            {
                writer.WriteLine(customer.Name);
            }
        }
        catch (Exception ex)
        {
            throw ex;
        }
    }
}

Every method consisted of a try with the actual code of interest inside of it and then a caught general exception that was then thrown. As I looked more through the code base, it became apparent to me that this was some kind of ‘standard’ (and thus perhaps exhibit A of how we get standards wrong). Every method in the project did this.

If you’re reading and you don’t know why this is facepalm, please read on. If you’re well-versed in C# exceptions, this will probably be review for you.

Preserve the Stack Trace

First things (problems) first. When you throw exceptions in C# by using the keyword “throw” with some exception type, you rip a hole in the fabric of your application’s space-time–essentially declaring that if no code that’s called knows how to handle the singularity you’re belching out, the application will crash. I use hyperbolic metaphor to prove a point. Throwing an exception is an action that jolts you out of the normal operation of your program by using a glorified “GOTO,” except that you don’t actually know where it’s going because that’s up to the code that called you.

When you do this, the .NET framework is helpful enough to package up a bunch of runtime information for troubleshooting purposes, including something called the “stack trace.” If you’ve ever seen a .NET (or Java) site really bomb out, you’ve probably seen one of these–it’s a bunch of code with line numbers that basically tells you, “A called B, which called C, which called D … which called Y, which called Z, which threw up and crashed your program.” When you throw an exception in C# the framework saves the stack trace that got you to the method in question. This is true whether the exception happens in your code or deep, deep within some piece of code that you rely on.

So, in the code above, let’s consider what happens when the code is executed on a machine with no C:\temp folder. The StreamWriter constructor is going to throw an exception indicating that the path in question is not found. When it does, you will have a nice exception that tells you ProcessCustomer called ProcessActiveCustomer, which called WriteCustomerToFile, which called new StreamWriter(), which threw an exception because you gave it an invalid path. Pretty neat, huh? You just have to drill into the exception object in the debugger to see all of this (or have your application configured to display this information in a log file, on a web page, etc.).

But what happens next is kind of a bummer. Instead of letting this exception percolate up somewhere, we trap it right there in the method in our catch block. At that point, we throw an exception. Now remember, when you throw an exception object, the stack trace is recorded at the point that you throw, and any previous stack trace is blown away. Instead of it being obvious that the exception originated in the StreamWriter constructor, it appears to have originated in WriteCustomerToFile. But wait, it gets worse. From there, the exception is trapped in ProcessActiveCustomer and then again in ProcessCustomer. Since every method in the code base has this boilerplate, every exception generated will percolate back up to main and appear to have been generated there.

To put this in perspective, you will never be able to see or record the stack trace for the point at which the exception was thrown. Now, in development that’s not the end of the world since you can set the debugger to break where thrown instead of handled, but for production logging, this is awful. You’ll never have the foggiest idea where anything is coming from.

How to fix this? It’s as simple as getting rid of the “throw ex;” in favor of just “throw;” This preserves the stack trace while passing the exception on to the next handler. Another alternative, should you wish to add more information when you throw, would be to do “throw new Exception(ex)” where you pass the exception you’ve caught to a new one that you’re creating. The caught exception will be preserved, intact, and can be accessed in debugging via the “InnerException” property of the one you’re now throwing.

public class CustomerProcessor
{
    public void ProcessCustomer(Customer customer)
    {
        try
        {
            if (customer.IsActive)
                ProcessActiveCustomer(customer);
        }
        catch (Exception ex)
        {
            throw;
        }
    }

    private void ProcessActiveCustomer(Customer customer)
    {
        try
        {
            CheckCustomerName(customer);
            WriteCustomerToFile(customer);
        }
        catch (Exception ex)
        {
            throw;
        }
    }

    public void CheckCustomerName(Customer customer)
    {
        try
        {
            if (customer.Name == null)
                customer.Name = string.Empty;
        }
        catch (Exception ex)
        {
            throw;
        }
    }

    private void WriteCustomerToFile(Customer customer)
    {
        try
        {
            using (StreamWriter writer = new StreamWriter(@"C:\temp\customer.txt"))
            {
                writer.WriteLine(customer.Name);
            }
        }
        catch (Exception ex)
        {
            throw;
        }
    }
}

(It would actually be better here to remove the Exception ex altogether in favor of just “catch {” but I’m leaving it in for illustration purposes)

Minimize Exception-Aware Code

Now that the stack trace is going to be preserved, the pattern here isn’t actively hurting anything in terms of program flow or output. But that doesn’t mean we’re done cleaning up. There’s still a lot of code here that doesn’t need to be.

In this example, consider that there are only two methods that can generate exceptions: ProcessCustomer (if passed a null reference) and WriteCustomerToFile (various things that can go wrong with file I/O). And yet, we have exception handling in every method, even methods that are literally incapable of generating them on their own. Exception throwing and handling is extremely disruptive and it makes your code very hard to reason about. This is because exceptions, as mentioned earlier, are like GOTO statements that whip the context of your program from wherever the exception is generated to whatever place ultimately handles exceptions. Oh, and the boilerplate for handling them makes methods hard to read.

The approach shown above is a kind of needlessly defensive approach that makes the code incredibly dense and confusing. Rather than a strafing, shock-and-awe show of force for dealing with exceptions, the best approach is to reason carefully about where they might be generated and how one might handle them. Consider the following rewrite:

public class CustomerProcessor
{
    public void ProcessCustomer(Customer customer)
    {
        if(customer == null)
            Console.WriteLine("You can't give me a null customer!");
        try
        {
            ProcessActiveCustomer(customer);
        }
        catch (SomethingWentWrongWritingCustomerFileException)
        {
            Console.WriteLine("There was a problem writing the customer to disk.");
        }
    }

    private void ProcessActiveCustomer(Customer customer)
    {
        CheckCustomerName(customer);
        WriteCustomerToFile(customer);
    }

    public void CheckCustomerName(Customer customer)
    {
        if (customer.Name == null)
            customer.Name = string.Empty;
    }

    private void WriteCustomerToFile(Customer customer)
    {
        try
        {
            using (var writer = new StreamWriter(@"C:\temp\customer.txt"))
            {
                writer.WriteLine(customer.Name);
            }
        }
        catch (Exception ex)
        {
            throw new SomethingWentWrongWritingCustomerFileException("Ruh-roh", ex);
        }
    }
}

Notice that we only think about exceptions at the ‘endpoints’ of the little application. At the entry point, we guard against a null argument instead of handling it with an exception. As a rule of thumb, it’s better to handle validation via querying objects than by trying things and catching exceptions, both from a performance and from a readability standpoint. The other point of external interaction where we think about exceptions is where we’re calling out to the filesystem. For this example, I handle any exception generated by stuffing it into a custom exception type and throwing that back to my caller. This is a practice that I’ve adopted so that I know at a glance when debugging if it’s an exception I’ve previously reasoned about and am trapping or if some new problem is leaking through that I didn’t anticipate. YMMV on this approach, but the thing to take away is that I deal with exceptions as soon as they come to me from something beyond my control, and then not again until I’m somewhere in the program that I want to report things to the user. (In an actual application, I would handle things more granularly than simply catching Exception, opting instead to go as fine-grained as I needed to in order to provide meaningful reporting on the problem)

Here it doesn’t seem to make a ton of difference, but in a large application it will–believe me. You’ll be much happier if your exception handling logic is relegated to the places in the app where you provide feedback to the user and where you call external stuff. In the guts of your program, this logic isn’t necessary if you simply take care to write code that doesn’t contain mistakes like null dereferences.

What about things like out of memory exceptions? Don’t you want to trap those when they happen? Nope. Those are catastrophic exceptions beyond your control, and all of the logging and granular worrying about exceptions in the world isn’t going to un-ring that bell. When these happen, you don’t want your process to limp along unpredictably in some weird state–you want it to die.

On the Lookout for Code Smells

One other meta-consideration worth mentioning here is that if you find it painful to code because you’re putting the same few lines of code in every class or every method, stop and smell what your code is trying to tell you. Having the same thing over and over is very much not DRY and not advised. You can spray deodorant on it with something like a code snippet, but I’d liken this to addressing a body odor problem by spraying yourself with cologne and then putting on a full body sweatsuit–code snippets for redundant code make things worse while hiding the symptoms.

If you really feel that you must have exception handling in every method, there are IL Weaving tools such as PostSharp that free you from the boilerplate while letting you retain the functionality and granularity you want. As a general rule of thumb, if you’re cranking out a lot of code and thinking, “there’s got to be a better way to do this,” stop and do some googling because there almost certainly is.