Sunday, 15 May 2011

Software is not built. It is written.

This post compares the work of traditional engineers with that of software engineers, understanding where they are similar and where they are different helps me to understand the software development process. This post is a treatise on how to make waterfall work, and why Agile is here to stay.

"An author would not expect the first draft of their book to be published. Few if any publishers would accept the first draft of a book. So why then, is it so common for software projects to release first drafts of their programs?"

Summary of the problem

When first drafts of Software is released intentionally then the company making this decision will be more aware of its consequences, and so in a better position to manage the consequences than a company that is unaware that the product that they are releasing is effectively the first prototype of an engineering design. Usually the consequences of this will include the costs of stabilising the product and long delays, sometimes creating a bottomless pit of expense just to keep the system turned on; and it only follows that the cost of adding more features to such a system will be equally large and over time they will increase.

It is a large step to claim that a lot of projects release drafts of software, however it is common practice for the first design to a solution to be implemented and for the first implementation to be passed to testers; there is then often strong resistance to revise the solution to improve its behaviour and maintainability. The refactoring movement and the notion of code debt has helped to raise awareness. However it is still common for there to be a mismatch between senior management and developers in what state they believe the software product to be in. How can it be so common for smart people to release first drafts of a design without realising that it is the first draft? How effectively can such a mistake be managed if the senior management are not aware of the situation? In my experience the root cause of this mistake has often been a misunderstanding of what a Software Engineer actually does and thus as a consequence a misunderstanding in what they need in order to effectively coordinate them to deliver a successful project now AND to position said project ready for the success of the next project.

The building metaphor

I wonder whether part of the problem, part of the reason why some projects mistake a first draft as the final product is the overuse of the 'building' metaphor that is commonly used to describe software development. A weakness in this metaphor that software is built, is the comparison of the skills needed to manage a labourer involved in a task that is approximately deterministic and that of a non-deterministic task. Building sites, factories, any kind of manufacturing or building process is designed around managing tasks that are, to within some tolerance, deterministic. Consider a skilled builder who has perfected laying one brick after another with a small enough error rate that quality assurance is enough to ensure that the house will not fall down. Does it follow that the same process can be used for the author of a book? Or an Engineer designing a bridge? Ask two Engineers to design a bridge, given the same spec and you will have two different bridges. Ask two brick layers to build two walls to a spec, and you will have approximately two identical walls. Both skill sets call upon different management skills to manage; which of these examples is most like software development? A misunderstanding here will result in the wrong management tools being brought to bare.

Is the act of writing software deterministic, or is it non-deterministic? Ask two programmers to implement a timesheet program, given the same spec I guarantee to you that you will get two very different programs that will be different in just about every way that one can think to measure them. And I am not talking about the small tolerance introduced by the nature of the human hand, as in the brick laying example. I am talking about fundamental differences closer to the example of two different bridges described earlier.

In order to make this difference clearer lets take a look at the core of what a Software Engineer does, without using metaphors the very least that every programmer has to do is to write in a programming language (Java, C#, JavaScript, C, SQL etc) instructions to a computer telling it what to do, how and when. Knowledge of what instructions to give to the computer is a problem solving, design and modelling task, mapping that to one or more written languages is a translation and communication task.

The influence of Taylorism on software projects

So where is the building metaphor in this description of programming? In the 1980's there was every attempt to bring Taylorism to software development, in the same way that Taylorism removed the non-deterministic nature inherit in the craftsmanship of trades like wood and metal working on production lines, people attempted to remove it from programming too. Or more precisely the attempt was to move the non-deterministic nature of the crafts from the fabrication stage and to move it to the design stage. Thus allowing different skills, risk processes and management techniques to be brought to bare. This is the basis of the 'Big Upfront Design' project. Big Upfront Design (BUD) endeavours to raise the confidence of when a new software product will ship, and to give the clients signing the check more confidence on what they are going to receive. The rationale is that this confidence can be afforded because the uncertainties will have been ironed out during the design stage and the build stage can be reduced to an almost mechanical process.

If you have worked for a company that has employed the BUD approach, then you will have probably also witnessed a great amount of frustration as the 'build' stage never goes as smoothly as planned. The reason for this is that the design stage was not followed rigorously enough to remove all uncertainty from the 'build' stage. The rest of this post explores why the goal of separating all design from the build stage, as has been done in other engineering disciplines will never work for software development. And a BUD effort that is not aware of the difference, and expects to gain the benefits as seen in traditional engineering disciplines will, on complex projects fail before the project has even started.

Where Software Engineers have an advantage

In more traditional engineering the cost of manufacturing is extremely high, just the tooling costs alone can be significant; think bridges, cars, circuit boards and houses. The task of the traditional engineer is to design a physical solution to a problem and to then communicate how to build the thing that they have designed so a separate group of people can take those designs and fabricate them. The fabrication stage is very expensive, so a great amount of rigour is used to ensure that the plans are as accurate, expressive and correct as possible before building begins. The expectation of many software BUD efforts is to reach this level of maturity, but in order to reach this level engineering companies have to spend a lot of time and money in the design and verification of design stages before moving on to fabrication. This makes a lot of sense for engineering companies, the fabrication/build stage is extremely expensive. In fact just tooling up for production can be mega money, so they need to know that they are buying the right tools and that by following the plans to the letter that they will always build the same product. QA is then needed to control the last statistical variance of production seen in the real world. In order for BUD software projects to gain the same level of confidence as other engineering efforts, then they too must spend extra time verifying their designs with the last stage of verification is building the end design at least once before being confident to commit to tooling up for the mass production of the design. However how many software projects take verification to that extreme before committing to a build?

The idea of not committing to the build stage of a software product, until the designs have been verified by building it at least once probably sounds mad. It may be common when designing an aircraft, but software is not an aircraft. And I agree with you. Software products have a significant advantage over mechanical products, software products are extremely cheap to duplicate. There is no need to tool up for mass production within a software project in the same way as physical products. Mass production is not cheap and the great news is that software does not need it. This is a significant advantage for software projects. The reason for traditional engineers needing to use mathematical simulations, models, scaled down builds etc before committing to building the first prototype is that prototypes cost a lot of money and take a long time to build. The more expensive it is to build and test prototypes, the more rigorous the engineering disciplines become in order to verify their designs before committing to any kind of build. Software Engineers can take their designs and build the application within minutes at the cost of a few minutes electricity to drive the computers. Thats it. Reread this paragraph, at its heart is the key to understanding both the engineering side of waterfall and agile methodologies. Now reread it again and think about it.

The misunderstanding

A goal of a BUD in engineering is to make the construction stage predictable, control costs and reduce risks by knowing the outcome of the construction stage before starting. The big benefit here is for mass production. Taking the time to design everything upfront is expensive and time consuming, and because it is design, the final solution is not clear at the start of the process. The expense of BUD is offset against the savings created by reducing the costs of mass production. On a software project the mass production phase is as simple as a file copy, or a compact disk copy or a download off of the internet or the Apple store. As there are no mass production costs with software, where is the savings with which to offset a BUD effort? The cost of the first build? The first prototype? There is no build savings with which to offset the cost of BUD. No mass production stage means that software is often viewed as done when the first prototype is completed1.

Computer languages as the specification language

If a software architect could hand a developer a specification so complete that the programmer would be able to produce exactly what was requested, in the same way that a factory builds a part designed by an engineer then there would be no need for the programmer. Because the specification would be so complete, so unambiguous that a computer could follow the plans itself and build the application. This fact is worth repeating, a software architect cannot fully define the specification of a system to the point that a developer can fully build it with no ambiguity without them implementing the full solution. The software languages used by software experts is the specification language used by the software industry to describe exactly how to build an application. Software experts then use computers to perform the 'build' for us, they read the formal specifications written in languages like C and Java and they compile, verify and link them into machine code. Modern factories are working to fully automate their production lines, the software industry has had that since the fifties.

Realising that software has no build stage, in the traditional sense is extremely empowering. It explains why the industry efforts with Model Driven Design stalled, why the Agile movement gained traction in the nineties and while Agile was first considered only a fad, it explains why agile is still with us in 2011. It explains why the biggest cost savings in the software industry has been realised by improving the expressiveness of the programming languages (the specification languages) and it explains why companies that respond to software project problems (client dissatisfaction, lack of trust, failed projects etc) by strengthening their BUD efforts while at the same time being unprepared for the costs and lengthening in feedback cycles that come with it, will not experience the increase in certainty that they hope for until they have written the solution at least once2. That all important first prototype. This lesson is especially difficult to learn on BUD efforts, as the feedback cycles are so long on such projects. The failures can take a year or more before they are recognised and the first thought is often that the execution of the process was wrong. The blame game begins. The developers did not follow the plan, the developers were not good enough, the tests were sloppy or the architects and management were incompetent. The mindset then becomes that tightening up the BUD process further will solve those problems. They will not, that is not where the problem lies. Nobody was incompetent; we as an industry have been focusing on emulating other engineering disciplines without acknowledging the ways that we are different.

To make waterfall work, a company must be prepared to fully plan a solution to the point where a computer can execute it, they must know where the design of the solution really starts and ends, and accept that the nature of design is to not know what the final outcome will be. It would not need designing if we knew. Once the first prototype has been created then the company can verify it, and then decide their next steps with increased certainty; and not before. The risk is that with a long design phase, there are long periods with no feedback. If a project is cancelled, unless the first draft was completed then it is unlikely that there will be any benefit to gain from the cancelled project. Therefore to reduce project risk, reduce the feedback cycle; do not lengthen it.

In conclusion

Software, unlike materials in the real world is cheap to build and to rework. The cost is in the design, writing and verification phases, and that is the process that needs to be managed. Because of the nature of software, the craftsman who write the programs are part of the design phase and attempts to separate the design and the 'build' phases did not succeed in the eighties and they will never succeed because the build phase in software development has already been automated by relatively cheap machines.

The biggest savings in software development are realised by reducing exposure to risk by reducing feedback cycles, effectively managing labour during the design process, managing the information gained from the shorter feedback cycles, reducing labour costs by automating as much of the non-design work that occurs in order to support the main design work and improvements to the languages used to describe the plans to computers.

1 Given the analogy between releasing the first draft of a book and the first draft of a program it is worth taking a moment to compare how software engineering is and is not like writing a book. A book is usually written by one or two people, in a single language for humans to read, enjoy and learn from and after the publish date it is difficult to change again; it is difficult to see comparisons with engineering here as books written by authors are rarely rigorous instructions on how to build anything. In contrast the source code to a software system is nearly always written in multiple programming languages, by many programmers for computers and other programmers to understand and follow as literal, logical steps; instructions on how to attain a desired effect. And after release software enters a maintenance phase which can last many years and involves many changes by many more people often after the original programmers have left.

And so Software Engineering is no more like writing a novel than it is like building a house, however it is as similar to a Mechanical Engineer writing instructions on how to build their design. A Software Engineer writes instructions on how to generate a desired effect that they have designed. It is this similarity of writing, and the need for clear communication that the opening comparison of this post is designed to draw attention to. Software Engineering is as similar to writing a book as Mechanical Engineering is; and Mechanical Engineering is more like building a house than Software Engineering is. So for me, this calls into question the very premise that software is built and it is that premise and the consequences of it that are the topic of this post.

2 As a side note, once a company has paid with blood to attain the first working prototype the team that created it often disbands. All that knowledge of the problem, design and solution spaces are gone and any team that follows them have to backwards engineer the original intent or just hack it. This is how many companies with frustrating projects literally haemorrhage money.

No comments:

Post a Comment