Books versus Covers

Back when I was a young scholar there were several things one learned that violated the “never judge a book by its cover” rule. One was that when you saw a disheveled fellow walking down the street talking to himself, you could reliably assume that he was disturbed and probably not taking his medication. And you could assume that a nicely typeset and printed article was worth reading.

Things have changed.

Now when you see an unshaven fellow in rumpled clothes walking down the street conducting an animated conversation you can’t assume that he’s off his Chlorpromazine. He might just as well be an investment banker working on a big deal.

Why did typsetting signify quality writing? Dating from the days of Aldus Manutius typesetting a book or an article attractively in justified columns using proportionally spaced fonts was a time-consuming task involving expensive skilled labor. Because of that high up-front cost, publishers insisted on strong controls on what made it to press. Thus we had powerful editors making decisions about what got into commercial magazines and books. And we had legions of competent copy editors engaged in reviewing and refining the text so that what did make it to press was spelled correctly, grammatically sound, and readable.

No one ever had to explicitly tell us that nicely typeset stuff was generally the better stuff, we learned it subconsciously.

Some years ago, in the first blush of desktop publishing, someone handed me a beautifully typeset article. Shortly after starting to read it I realized that it was hopeless drivel. After a few repetitions of this experience I came to the realization that with Framemaker, Word, and similar systems prettily typeset output could now be produced with less effort than a draft manuscript in the bad old days. An important cultural cue was lost. The book could no longer be judged by its cover.

Fixing a bug in the TreeTable2 example

This New Year I resolved to run backups of our computers regularly in 2007. My vague plan was to dump the data to DVDs, since both of our newest machines, a Dell PC running Windows XP Pro and a Mac have DVD burners.

What, to my dismay, did I learn when I examined the Properties of my home directory on the PC? It weighs in at over 140 gigabytes. The DVDs hold about 6 gigabytes, so it would take at least 24 DVDs to run a backup. Aside from the cost, managing 24 DVDs sort of defeats the purpose.

Before going to plan B, getting an outboard disk drive to use as the backup device, I thought I’d investigate all of this growth in my home directory. Last time I looked, my home directory was less than 10 gigabytes.

In the past I’ve used du from the bash command line to investigate the file system. This is powerful, but it’s slow and very painful. What I really wanted was a tree browser that was smart enough to show me the size of each subtree.

In a project that I’d worked on a couple of years ago I learned that there’s a cool thing called a TreeTable that has just the right properties. The leftmost column is a hierarchical tree browser, while the columns to the right are capable of containing data associated with the tree nodes on the left.

Thought I, “let’s get a treetable class from somewhere and then marry it with some code that can inspect the file system.” So I googled for ‘treetable’ and found not only a very nice treetable library available free, but an example built using it that did exactly what I wanted.

After downloading the source code and creating a project in Eclipse, I ran it. It worked nicely and was just what I wanted. But there was one small problem.

It reported that my home directory had a negative size:

Tree Table - Negative Size

Immediately that told me that somewhere in the code a node size was being represented as an integer, a 32-bit quantity that couldn’t represent more than 2 gigabytes before wrapping and showing a negative number. What I really wanted was an unsigned 64-bit number, though I suspected that I’d have to settle for a long, a 64-bit signed number. That would be adequate for now, since my 140 gigabyte file system size could be represented comfortably in a 38-bit integer.

The next step was to find and fix the problem with the code. My fear was that the underlying system call was returning an integer, which would have made the fix potentially quite painful. Fortunately, however, the problem turned out to reside in a single line of code in FileSystemModel2.java:

if (fn.isTotalSizeValid()) {
return new Integer((int)((FileNode)node).totalSize());
}

In this you can see that the long that is being returned by totalSize() from the FileNode node is being forcibly converted (don’t you love the word “coerce?”) to an integer.

Replacing the coercion with an appropriate Long object was the work of moments:

if (fn.isTotalSizeValid()) {
return new Long(((FileNode)node).totalSize());
}

Which had the desired result:

Tree Table - Correct Size

With this version I was able rapidly to navigate to the directory where I had stored the video that I’d made at the Bar Mitzvah of a friend’s son, files that I certainly didn’t need to back up and that represented the vast bulk of the 140 gigabytes.

Source code and education

For a long time I’ve been interested in how good programmers get that way. Back in 2002 I posted a comment to a mailing list of hackers. This group is the original sort of hackers – people who program for love, not the modern sort who write viruses and try to crack systems. One of them was so taken by it that he posted it on his own website

What it says is:

If we taught writing the way we try to teach programming …

Imagine if we tried to teach writing (in English or any other natural language) the way we try to teach programming.

We’d give students dictionaries and grammar books. We’d lecture them on the abstract structure of stories. We’d give them dreadful stuff to read – only things written by the most junior writers, like advanced underclassmen or young grad students (some of whom can indeed write well, but most of whom are dreadful). We’d keep the great literature secret. Shakespeare would be locked up in a corporate vault somewhere. Dickens would be classified Secret by the government. Twain would have been burned by his literary executor to prevent it competing with his own efforts.

And when people take jobs as writers (here the analogy begins to break down) their primary assignments for the first five to fifteen years of their working lives will be copy editing large works that they won’t have time to read end-to-end, for which there is no table of contents or index, and which they receive in a large pile of out-of-order, unnumbered pages, half of which are torn, crumpled, smudged, or otherwise damaged.

Is it any wonder that good programmers are so rare in the wild?

The thinking behind that statement developed back in the 1980s when I was a grad student at CMU. A group of grad students, me among them, met monthly to read code and drink wine. We all agreed that an important ingredient in learning to be a good programmer was reading good and bad code.

Unfortunately, in those days there was precious little code to read. Interestingly, all of the best software research and education institutions of the time were organized around repositories of software that all of the members contributed to and partook of. I include in this category organizations that I knew, or knew of, well enough to make this claim. They included MIT’s AI Lab, Stanford’s AI Lab, CERN, CMU’s Computer Science Department, IBM’s Research Lab, Princeton, Yale, and Berkeley. (There were certainly others, but I didn’t know people there or what sort of source code sharing went on there.) On reflection it’s interesting to realize that this is similar to how some of the earliest universities, Oxford and Cambridge in the UK, came about – a bunch of scholars pooling their most critical and precious resource, their books.

In the early days of software being a programmer meant much more than writing code. Programming included working with users, designing user interfaces, laying out the architecture, as well as writing the actual code. Nowadays we expect software people to be highly specialized and work in large teams, but we continue to believe deeply that a software person must be broadly educated and experienced to be valuable.

Educating a programmer in the ’80s was a challenge because our models of what software was and how it should be built were only beginning to gel. Thus you couldn’t learn design patterns because they hadn’t been invented. Object-oriented programming was implicit in the Simula work that dated from the late 1960s, but the OO intellectual movement didn’t really form until some key ideas escaped from Xerox’s Palo Alto Research Center. And the adoption of those ideas was delayed by the limitations of Smalltalk until C++ and later Java reached maturity.

Things have changed.

The source code to many interesting systems is broadly available to anyone now. The result is that we now have a better ability to educate software people today than twenty and more years ago. And the opportunity to create great software education is no longer limited to the small number of institutions that managed to combine wealth and vision in the right mixture to produce comprehensive source code repositories. Anyone with an Internet connection can get tons of source code to study. Now the challenge is how to focus attention, given how much is available.

In addition, we’ve come a really long way on building software that is composable. UNIX pipelines were a tantalizing hint of the power of composable modules, but they were never quite enough to leap the shark. Now, however, the real action is in the composition of services into mashups, something that can be done rapidly and easily without any formal computer science training.

And with the source code sharing culture, it’s increasingly easy for great artists to compose instead of merely imitating.