The Way We Write Code is Stupid: Source Code Files Considered Harmful
Order Doesn’t Matter
Please pardon the loaded phrasing in the title, but that’s how the message came to me from my subconscious brain: bluntly and without ceremony. I was doing a bit of work in Apex, the object-oriented language specific to Salesforce.com, and it occurred to me that I had no idea what idiomatic Apex looked like. (I still don’t.) In C++, the convention (last time I was using it much, anyway) is to first define public members in class headers and then the private members at the bottom. In C#, this is inverted. I’ve seen arguments of all sorts as to which approach is better and why. Declaring them at the top makes sense since the first thing you want to see in the class is what its state will consist of, but declaring the public stuff at the top makes sense since that’s what consumers will interact with and it’s like the above-water part of your code iceberg.
When programming in any of the various programming languages I know, I have this mental cache of what’s preferred in what language. I attempt to ‘speak’ it without an accent. But with Apex, I have no idea what the natives ‘sound’ like, not having seen it in use before. Do I declare instance variables at the bottom or the top? Which is the right way to eat bread: butter side up or butter side down? I started googling to see what the ‘best practice’ was for Apex when the buzzing in my subconscious reached some kind of protesting critical mass and morphed into a loud, clear message: “this is completely stupid.”
I went home for the day at that point–it was late anyway–and wondered what had prompted this visceral objection. I mean, it obviously didn’t matter from a compiled code perspective whether instance variables or public methods come first, but it’s pretty well established and encouraged by people as accomplished and prominent as “Uncle” Bob Martin that consistency of source code layout matters, if not the layout specifics (paraphrased from my memory of his video series on Clean Coders). I get it. You don’t want members of your team writing code that looks completely different from class to class because that creates maintenance headaches and obscures understanding. So what was my problem?
I didn’t know until the next morning in the shower, where I seem to do my most abstract thinking. I didn’t think it was stupid to try to make my Apex code look like ‘standard’ Apex. I thought it was stupid that I needed to do so at all. I thought it was stupid to waste any time thinking about how to order code elements in this file when the only one whose opinion really matters–the compiler–says, “I don’t care.” Your compiler is trying to tell you something. Order doesn’t matter to it, and you shouldn’t care either.
Use Cases: What OOP Developers Want
But the scope of my sudden, towering indignation wasn’t limited to the fact that I shouldn’t have to care about the order of methods and fields. I also shouldn’t have to care about camel or Pascal casing. I shouldn’t have to care about underscores in front of field names or inside of method names. It shouldn’t matter to me if public methods come before private or how much indentation is the right amount of indentation. Should methods be alphabetized or should they be in some other order? I don’t care! I don’t care about any of this.
Let’s get a little more orderly about this. Here are some questions that I ask frequently when I’m writing source code in an OOP language:
- What is the public API of this type?
- What private methods are in the ‘tree’ of this public method?
- What methods of this type mutate or reference this field?
- What are the types in this namespace?
- What are the implementations of this interface in this code base?
- Let’s see this method and any methods that it overrides.
- What calls this method?
Here are some questions that I never ask out of actual interest when writing source code. These I either don’t ask at all or ask in exasperation:
- What’s the next method in this file?
- How many line feed characters come before the declaration of this variable?
- Should I use tabs or spaces?
- In what region is this field’s declaration?
- Did the author of this file alphabetize anything in it?
- Does this source file have Windows or *NIX line break characters?
- Is this a field or a method or what?
With the first set of questions, I ask them because they’re pieces of information that I want while reasoning about code. With the second set of questions, they’re things I don’t care about. I view asking these questions as an annoyance or failure. Do you notice a common pattern? I certainly do. All of the questions whose answers interest me are about code constructs and all the ones that I don’t care about have to do with the storage medium for the code: the file.
But there’s more to the equation here than this simple pattern. Consider the first set of questions again and ask yourself how many of the conventions that we establish and follow are simply ham-fisted attempts to answer them at a glance because the file layout itself is incapable of doing so. Organizing public and private separately is a work-around to answer the first question, for example. Regions in C#, games with variable and method naming, “file” vs “type” view, etc. are all attempts to overcome the fact that files are actually really poor communication media for object-oriented concepts. Even though compilers are an awful lot different now than they were forty years ago, we still cling to the storage medium for source code best suited to those old compilers.
Not Taking our own Advice
If you think of an ‘application’ written in MS Access, what comes to mind? How about when you open up an ASP web application and find wizard-generated data sources in the markup, or when you open up a desktop application and find SQL queries right in your code behind? I bet you think “amateurs wrote this.” You are filled with contempt for the situation–didn’t anyone stop to think about what would happen if data later comes in some different form? And what about some kind of validation? And, what the–ugh… the users are just directly looking at the tables and changing the column order and default sorting every time they look at the data. Is everyone here daft? Don’t they realize how ridiculous it is to alter the structure of the actual data store every time someone wants a different ordering of the data?
And you should see some of the crazy work-arounds and process hacks they have in place. They actually have a scheme where the database records the name of everyone who opens up a table and makes any kind of change so that they can go ask that person why they did it. And–get this–they actually have this big document that says what the order of columns in the table should be. And–you can’t make this stuff up–they fight about it regularly and passionately. Can you believe the developers that made this system and the suckers that use it? I mean, how backward are they?
In case you hadn’t followed along with my not-so-subtle parallel, I’m pointing out that we work this way ourselves even as we look with scorn upon developers who foist this sort of thing on users and users who tolerate it. This is like when you finally see both women in the painting for the first time–it’s so clear that you’ll never un-see it again. Why do we argue about where to put fields and methods and how to order things in code files when we refuse to write code that sends users directly into databases, compelling them to bicker over the order of column definition in the same? RDBMS (or any persistence store) is not an appropriate abstraction for an end user–any end user–whether he understands the abstraction or not. We don’t demand that users fight, decide that there is some ‘right’ way to order invoices to be printed, and then lock the Invoice table in place accordingly for all time and pain of shaming for violations of an eighty-page invoice standard guideline document. So why do that to ourselves? When we’re creating object-oriented code, sequential files, and all of the particular orderings, traversings and renderings thereof are wildly inappropriate abstractions for us.
What’s the Alternative?
Frankly, I don’t know exactly what the alternative is yet, but I think it’s going to be a weird and fun ride trying to figure that out. My initial, rudimentary thoughts on the matter are that we should use some sort of scheme in which the Code DOM is serialized to disk for storage purposes. In other words, the domain model of code is that there is something called Project, and it has a collection of Namespace. Namespace has a collection of Type, which may be Interface, Enum, Struct, Class (for C# anyway–for other OOP languages, it’s not hard to make this leap). Class has one collection each of Field, Method, Property, Event. The exact details aren’t overly important, but do you see the potential here? We’re creating a hierarchical model of code that could be expressed in nested object or relational format.
In other words, we’re creating a domain model entirely independent of any persistence strategy. Can it be stored in files? Sure. Bob’s your uncle. You can serialize these things however you want. And it’ll need to be written to file in some form or another for the happiness of the compiler (at least at first). But those files handed over to the compiler are output transforms rather than the lifeblood of development.
Think for a minute of us programmers as users of a system with a proper domain, one or more persistence models, and a service layer. Really, stop and mull that over for a moment. Now, go back to the use cases I mentioned earlier and think what this could mean. Here are some properties of this system:
- The basic unit of interaction is going to be the method, and you can request methods with arbitrary properties, with any filtering and any ordering.
- What appears on your screen will probably be one or more methods (though this would be extremely flexible).
- It’s unlikely that you’d ever be interested in “show me everything in this type.” Why would you? The only reason we do this now is that editing text files is what we’re accustomed to doing.
- Tracing execution paths through code would be much easier and more visual and schemes that look like Java’s “code bubbles” would be trivial to create and manipulate.
- Most arguments over code standards simply disappear as users can configure IDE preferences such as “prepend underscores to all field variables you show me,” “show me everything in camel casing,” and, “always sort results in reverse alphabetical order.”
- Arbitrary methods from the same or different types could be grouped together in ad-hoc fashion on the screen for analysis or debugging purposes.
- Version/change control could occur at the method or even statement level, allowing expression of “let’s see all changes to this method” or “let’s see who made a change to this namespace” rather than “let’s see who changed this file.”
- Relying on IDE plugins to “hop” to places in the code automatically for things like “show all references” goes away in favor of an expressive querying syntax ala NDepend’s “code query language.”
- New domain model allows baked-in refactoring concepts and makes operations like “get rid of dead code” easier or trivial, in some cases.
Longer Reaching Impact
If things were to go in this direction, I believe that it would have a profound impact not just on development process but also on the character and quality of object oriented code that is written in general. The inherently sequential nature of files and the way that people reason about file parsing, I believe, lends to or at least favors the dogged persistence of procedural approaches to object oriented programming (static methods, global state, casting, etc.). I think that the following trends would take shape:
- Smaller methods. If popping up methods one at a time or in small groups becomes the norm, having to scroll to see and understand a method will become an anomaly, and people will optimize to avoid it.
- Less complexity in classes. With code operations subject to a validation of sorts, it’d be fairly easy to incorporate a setting that warns users if they’re adding the tenth or twentieth or whatever method to a class. In extreme cases, it could even be disallowed (and not through the honor system or ex post facto at review or check in–you couldn’t do it in the first place).
- Better conformance to Single Responsibility Principle (SRP). Eliminating the natural barrier of “I don’t want to add a new file to source control” makes people less likely awkwardly to wedge methods into classes in which they do not belong.
- Better cohesion. It becomes easy to look for fields hardly used in a type or clusters of use within a type that could be separated easily into multiple types.
- Better test coverage. Not only is this a natural consequence of the other items in this list, but it would also be possible to define “meta-data” to allow linking of code items and tests.
What’s Next?
Well, the first things that I need to establish is that this doesn’t already exist somewhere in the works and that I’m not a complete lunatic malcontent. I’d like to get some feedback on this idea in general. The people to whom I’ve explained a bit so far seem to find the concept a bit far-fetched but somewhat intriguing.
I’d say the next step, assuming that this passes the sanity check would be perhaps to draw up a white paper discussing some implementation/design strategies with pros and cons in a bit more detail. There are certainly threats to validity to be worked out such as the specifics of interaction with the compiler, the necessarily radical change to source control approaches, the performance overhead of performing code transforms instead of just reading a file directly into memory, etc. But off the top of my head, I view these things more as fascinating challenges than problems.
In parallel, I’d like to invite anyone who is at all interested in this idea to drop me an email or send me a tweet. If there are others that feel the way I do, I think it’d be really cool to get something up on Github and maybe start brainstorming some initial work tasks or exploratory POCs for feasibility studies. Also feel free to plus-like-tweet-whatever to others if you think they might be interested.
In conclusion I’ll just say that I feel like I’d really like to see this gain traction and that I’d probably ratchet this right to the top of my side projects list if people are interested (this being a bit large in scope for me alone in my spare time). Now whenever I find myself editing source files in an IDE I feel like a bit of a barbarian, and I really don’t think we shouldn’t have to tolerate this state of affairs anymore. Productivity tools designed to hide the file nature of our source code from us help, but they’re band-aids when we need disinfectants, antibiotics, and stitches. I don’t know about you, but I’m ready to start writing my object-oriented code using an IDE paradigm that doesn’t support GOTO Line as if we were banging out QBasic in 1986.
It’s been about 15 years since I used it in university, but check out Squeak (a Smalltalk implementation). If I remember correctly, you did not edit files in Squeak, you edited the ‘system’. It was very confusing for me at the time, compared to other programming languages.
Thanks for the tip — I spent some time looking at Pharo, which is apparently a fork off of Squeak, so I think I’m getting the general idea. I’m definitely going to spend some time studying how these tools work and maybe eavesdropping on a forum or discussion group somewhere if I can find one. I imagine that there’s got to be a good bit of wisdom and experience there, which I’m glad for.
I’ve been looking into Smalltalk recently myself and I think it offers a complete environment that solves many of the problems you mention as well as many problems other developers complain about. Ruby’s Smalltalk heritage and recent popularity I think has tipped people off to give Smalltalk a second look. If you’re interested, I’ve been reading a lot of posts from https://smalltalkrenaissance.wordpress.com/ This is a good article on ST https://medium.com/smalltalk-talk/why-aren-t-people-using-smalltalk-80de31b6e3f4#.bf8rqdlr1. I particularly like this excerpt: “Smalltalk occupies a very successful niche for business clients who value short ‘time to market’ and low software maintenance costs, and developers who prize rapid… Read more »
That’s a great quote 🙂
I’m intrigued by this. Have you been writing any code in Smalltalk?
I’ll check out the videos when I get a chance. I just relocated for the winter and am temporarily on a limited-data mifi until I can get a cable person out here.
I’ve written little bits as I learned the language. Nothing serious yet. Once I learn to write tests I’ll probably experiment with a web app.
I think that, as long as it didn’t lock us into any one IDE (you must use Visual Whatsis 2014 to edit the system, or it won’t work!), that the idea has considerable merit. Kind of like looking at albums in Windows Media Player – you don’t necessarily care what folder they’re stored in, only that these albums are by this artist, and so on. Having OCD, I tend to wast a lot of time organizing source code in many of the ways you described, above, but if I didn’t have to worry about any of that, I could spend… Read more »
Being able to focus exclusively on reasoning about the code and its quality is definitely my aim, and I certainly wouldn’t want to tightly couple the modeling and persistence to any particular IDE. It seems like if something like this got going (outside of Smalltalk and Clojure anyway — say C# or Java), it’d be nice to define a minimal standalone IDE for it and also have plugins for popular IDEs. But definitely a good point about the dangers of tool lock-in — something to be wary of if an IDE vendor were to sponsor such a project.
This problem certainly does apply to more than just code. The folders and files abstraction really needs to be replaced at the OS level with something more user friendly. Microsoft has been working on that for years with nothing much to show for it though.
I totally agree. I’ve just had to change from Visual Studio to some kind of proprietary language and tools for some project at work, and while learning the new stack, I’ve been annoyed a lot by the need to know “where are those kind of files stored in the filesystem?” or “where the hell is that function called?”. Realized that I took the habit to navigate through code by almost only search and reference jump. Fully embracing that by getting rid of the file level would be great. Would be a serious amount of work though. And I guess it… Read more »
My mind is similarly swimming with questions, and many of the same ones as I contemplate what would be involved. I think maybe the first step would be building something that reads the existing source into some domain model in memory and presents some limited subset of functionality to the user… but that’s just off the top. With a working domain model, while I still don’t have answers to most of those questions, I feel that they could at least be separated and tackled orthogonally.
I’m very much reminded of Smalltalk (Squeak, Pharo, VisualWorks, etc.) as I read this. I’m also very much reminded of what Chris Grainger is doing with LightTable (which admittedly points back to Smalltalk in a way). You should check these out. They may not be exactly what you’re looking for, but they will definitely help you better understand what it is you’re describing. I felt the same way and I’ve loved every minute of working in Smalltalk (mostly Pharo) and LightTable (with Clojure).
LightTable looks pretty cool. I’m going to have to check that out.
Thanks for the leads. I got the quickest impression from Pharo since it had some screenshots up on its site, and the interaction with the code is definitely similar to what I’m picturing. I’m going to do some more poking around to see if I can get a high level view without actually needing to teach myself Smalltalk or Clojure (I’m not necessarily opposed — just not flush with free time).
[…] started this as an aside to my post about source files as harmful, but axed it as sort of awkward and too much of a digression. Like a weird sweater that you get for […]
ダウン レディース
長財布 レディース
[…] are bizarre and beneficial only as exercises in creativity, like the elevator example, but some are genuine ideas for reboots in thinking about our craft. I find the exercise of indulging these mental divergences and quasi-day-dreams to […]
[…] value in this offering, to me, is the standardizing of code as data. I’ve written about this once before, and I think that gets lost in the shuffle when there’s talk about emitting IL at runtime and […]
I’m curious, now that this posting has aged a bit, if you’ve spent any time with the .NET EnvDTE libraries, which allow you to do much of what you’ve described (though I’ve found to be poorly documented and somewhat difficult to use). As I’ve read your article, I keep thinking – “isn’t that what a DLL is, in effect?” And at least in a .NET world, when you add on the meta-data files which allow you to reflect, etc., you could almost think of a compiled DLL as a “serialized version” from which you could simply edit meta-data, and re-compile… Read more »
I have not ever looked at the libraries that you mention, though as I poked through a bit I find that I agree with your comment about the documentation. Since writing this, I haven’t really done much, unfortunately. Always so many side projects going on and not enough hours in the day. The one exception is that I have started to play a bit with Roslyn, and my goal with that is really (when I occasionally get around to it) to start storing methods, types, fields, and namespaces in some sort of relational fashion just as a proof of concept.… Read more »