2005 : WHAT DO YOU BELIEVE IS TRUE EVEN THOUGH YOU CANNOT PROVE IT? [1]

charles_simonyi's picture [5]
Software Engineer, Computer Scientist, Entrepreneur, Philanthropist
Computer Scientist; Founder, Intentional Software; formerly Chief Architect, Microsoft

I believe that we are writing software the wrong way. There are sound evolutionary reasons for why we are doing what we are doing—that we can call the "programming the problem in a computer language" paradigm, but the incredible success of Moore's law blinded us to being stuck in what is probably an evolutionary backwater.

There are many warning signs. Computers are demonstrably ten thousand times better than not so long ago. Yet we are not seeing their services improving at the same rate (with some exceptions—for example games and internet searches.) On an absolute scale, a business or administration problem that would take maybe one hundred pages to describe precisely, will take millions of dollars to program for a computer and often the program will not work. Recently a smaller airline came to a standstill due to a problem in crew scheduling software—raising the ire of Congress, not to mention their customers.

My laptop could store 200 pages of text (1/2 megabytes) for each and every crew member at this airline just in its fast memory and hundred times more (a veritable encyclopedia of 20,000 pages) for each person on its hard disk. Of course for a schedule we would need maybe one or two—or at most ten pages per person. Even with all the rules—the laws, the union contracts, the local, state, federal taxes, the duty time limitations, the FAA regulations on crew certification; is there anyone who believes that the problem is not simple in terms of computing? We need to store and process at the maximum 10 pages per person where we have capacity for two thousand times more in one cheap laptop! Of course the problem is complex in terms of the problem domain—but not shockingly so. I would estimate that all the rules possibly relevant to aircraft crew scheduling are expressible in less than a thousand pages—or 1/2 of one percent of the fast memory.

Software is surely the bottleneck on the high-tech horn of plenty. The scheduling program for the airline takes many thousand times more memory than what I believe it should be. Hence the software represents complexity that is many thousand times greater than what I believe the problem is—no wonder that some planes are assigned three pilots by the software while the others can't fly because the copilot is not scheduled. Note that the cost of the memory is not the issue—we could afford that waste. But the use of so much memory for software is an indication of some complexity inflation that occurs during programming that is the real bottleneck.

What is going on? I like to use cryptography as the metaphor. As we know, in cryptography we take a message and we combine it with a key using a difficult-to-invert function to get the code. Programmers using today's paradigm start from a problem statement, for example that a Boeing 767 requires a pilot, a copilot, and seven cabin crew with various certification requirements for each—and combine this with their knowledge of computer science and software engineering—that is how this rule can be encoded in computer language and turned into an algorithm. This act of combining is the programming process, the result of which is called the source code. Now, programming is well known to be a difficult-to-invert function, perhaps not to cryptography's standards, but one can joke about the possibility of the airline being able to keep their proprietary scheduling rules secret by publishing the source code for the implementation since no one could figure out what the rules were—or really whether the code had to do with scheduling or spare parts inventory—by studying the source code, it can be that obscure.

The amazing thing is that today it is the source code—that is the encrypted problem—which is the artifact all of software engineering is focusing on. To add insult to the injury, the "encryption", that is programming, is done manually which means high costs, low throughput and high error rates. In contrast with software maintenance, when the General realizes that he is about to send a wrong encrypted message, no one would think of editing the code after the encryption or "fixing the code"; instead the clear text would be first edited and then this improved message would be re-encrypted at computer speeds and computer accuracy. In other words the message may be wrong, but it won't be wrong because of the encryption and it is easily fixed.

We see that the complexity inflation comes from encoding. The problem statement above is obviously oversimplified, but remember that we used just two lines from our realistic budget of a thousand pages and we haven't even used the aviation jargon which can make these statements even more compact and more precise. But once these statements are viewed through the funhouse mirror of software coding, it becomes all but unrecognizable: thousand times fatter, disjointed, foreign. And as any manual product, it will have many flaws—beyond the errors in the rules themselves.

What can be done? Follow the metaphor. First, refocus on recording the problem statement—the "cleartext" in our metaphor. This is not a program in any sense of the word—it is just a straightforward recording of the subject matter experts' contributions using their own terms-of-art, their jargon, their own notations. Next, empower the programmers to program not the problem itself, but to express their software engineering expertise and decisions as a computer code for the encoder that takes the recorded problem statement and generates the code from it. This is called generative programming and I believe it is the future of software.