Are there any first principles of programming? I am at least going to entertain that hypothesis and try to put forward what I think are the first principles when designing quality software systems and when selecting the tools to use.
The first principle I will discuss is immutability.
In the 2006 paper “Out of the Tar Pit” Moseley and Marks states that complexity is the single major difficulty in the successful development of large-scale software systems. They believe that the major cause of this complexity is the handling of state and the burden that it add to the ability to analyze and reason about the system.
I think every senior software engineer can relate to the point made in that article. If you have worked on fairly complex systems you know how heavy the burden of trying to reason and keep in your head the current state of the application, and the effect that any change will have.
Why do we do it then? Why is mutability the default way most of us have been thought to program? In my view that is due to historical reasons. When resources in a machine was a very limiting factor on your design, it did not matter which design was the simpler one to reason about. You had to fit your program into the available resource constraints with “update in place” mutable structures. A trade-off between simplicity and resources.
Most software systems today do not have those limitations. Today lines of code keep increasing and system requirements becomes more and more complex. To stay competitive in the market, the speed of your ability to adapt, extend and maintain a system far outweigh the slight overhead that immutable data structures impose. The trade-off from old times should have shifted to a new default and favor simplicity over resources.
I join those who claim that immutable data structures are easier to reason about and I do that for these reasons:
- Immutable data is inherently thread-safe. A highly concurrent system built on mutable state is a nightmare to change and maintain.
- Immutability solves the ownership problems. When using a non garbage collected language the ownership problem showed itself in memory management. If you call a C/C++ library who is responsible for deallocating memory, is it you or the lib? GC languages solved that part but left us with questions as is this object thread safe? Do I need to lock it myself? Is it safe to clone it or are values moving? Do it allocate resources that I need to take responsibility for? Programming with immutability solves these ownership problems since it is just pure unchangeable data.
- It makes your logic easier to understand. Logic must be based on non-moving premises. Imagine that you should calculate “2 + 3”. You will first read the “2” and interpret is as the amount of two things and continue on to the “+” sign. While you read the “+” sign, I mutate the value of the symbol “2” to mean the amount of four things. We then end up with the non-sense result of “2 + 3 = 7”. All experienced programmers have seen this kind of behavior in algorithms in concurrent systems. One accidental miss of locking a concurrent mutable object can bring the logic of your application to its knees. All algorithms and calculation should be based on stable values.
- In a distributed environment having an image of the data as of a point in time increases the understanding of how the system behaves. In contrast to have data move constantly.
- It scales better in a distributed environment than remote representations of mutable data that give the impression of change over the wire. See the Internet as a reference here. When you do a HTTP GET you will get back immutable data as of that point in time. The server will not change it.
- Immutability increases security. For example, your validated input can not be changed by a malicious user. See immutable strings as an example.
- Given the points above, immutable data is easier to test.
But surely there must be some upsides to mutable state? Other than the potential for more efficient use of resources (if you know what you are doing!), I can’t think of any. In most cases it will be a premature optimization. If you still go down that route for what ever reason, be careful with that loaded gun so you don’t shot yourself in the foot.
In my opinion persistent data structures should always be the default. Only use mutable data when the situation really calls for it and immutable data have been proven to not fit the problem being solved. This could be in constrained environments such as embedded systems.
All these reasons put together is why I consider immutability a “first principle” of programming.
Persistent data structures can be implemented in most languages but since this is such an important principle for building high quality robust systems I would select a tool that defaults to immutability. If your selected tool do not enforce this it only take one mistake of one developer for the system to start to crumble. Also environments designed for immutability is more efficient at it than when it is implemented as an add-on to a mutable environment.
For a nice introduction of these concept you can check out for example Clojure. Even if you are never going to write a single production line of code in Clojure, I promise you that learning the idea of immutability and the simplicity it will bring your designs, will benefit you in whatever language you use.