The Most Wonderful Unix Programs

The author of the article, Douglas McIlroy, is an American mathematician, engineer, and programmer. He is best known for developing a pipeline in the Unix operating system, principles of component-oriented programming, and several original utilities: spell, diff, sort, join, speak, tr.

Sometimes you come across really wonderful programs. Rummaging through my memory, I have compiled a list of the real pearls of Unix for all the years. Basically, these are quite rare and not so necessary programs. But what sets them apart is originality. I can’t even imagine that I myself came up with the idea of ​​any of them.

Share, which programs have also hit you so much?

PDP-7 Unix


For starters, the PDP-7 Unix system itself. Its simplicity and power made me switch from a powerful mainframe to a tiny machine. Here is the quintessence of the hierarchical file system, a separate shell and user-level process control, which Multics could not implement on mainframes after hundreds of man-years of development. The disadvantages of Unix (for example, the structure of records in the file system) were as instructive and liberating as its innovations (for example, redirection of input-output in the shell).

dc


Robert Morris’s math library for a variable-precision desktop calculator used reverse error analysis to determine the accuracy needed at each step to achieve a user-specified accuracy. At a 1968 NATO software engineering conference, in my report on software components, I proposed reference procedures that can produce the result of any desired accuracy, but I did not know how to put them into practice. dc is still the only program known to me that can do this.

typo


Typo arranges the words in the text according to their similarity with the rest of the text. Eyeglasses like 'hte' tend to be at the end of the list. Robert Morris proudly said that the program would work equally well for any language. Although typo does not help to find phonetic errors, it became a real find for anyone typing, and did a lot of good before a much less interesting, but more accurate spell check on the dictionary appeared.

Typo is just as unexpectedly arranged inside as outside. The similarity measurement algorithm is based on the frequency of occurrence of trigrams, which are counted in an array of 26 × 26 × 26. In tiny memory, there was barely enough space for single-byte counters, so a scheme for compressing large numbers into small counters was implemented. To avoid overflow, the counters were updated on a probabilistic basis, supporting the estimation of the logarithm of the counter value.

eqn


With the advent of photocomposition, it became possible, but terribly tiring, to derive classical mathematical notation. Lorinda Cherry decided to develop a higher-level description language, and soon Brian Kernigan joined her. Their brilliant move was to express the oral tradition in writing, so eqn was surprisingly easy to learn. The first of its kind preprocessor of the language for describing mathematical expressions, eqn has hardly improved since then.

struct


Brenda Baker started the development of her Fortan-to-Ratfor converter, contrary to the advice of her boss - me. I thought that this could lead to a special reordering of the source text. It will be free of operator numbers, but otherwise no more readable than Fortran's well-structured code. Brenda proved that I'm wrong. She discovered that every Fortran program has a canonically structured form. Programmers preferred the canonical form rather than what they themselves originally wrote.

pascal


The syntax diagnostics in the compiler created by the Sue Graham group in Berkeley were the most useful of all that I have ever seen - and it was carried out automatically. With a syntax error, the compiler suggests inserting a token to continue parsing. No attempt to explain what is wrong. With this compiler, I learned Pascal in one evening, without any guidance at hand.

parts


Hidden inside the WWB (Writer's Workbench) partspackage, Lorinda Cherry 's module defines the parts of speech for words in English text based on only a small dictionary, spelling and grammar rules. According to this annotation, the WWB program displays stylometric indicators of the text, such as the predominance of adjectives, subordinate clauses and complex sentences. When Lorinda was interviewed on the NBC channel Today and talked about innovative grammar checking in WWB texts, this was the first mention of Unix on television.

egrep


Al Aho hoped his deterministic regular expression resolver would overtake Ken's classic non-deterministic resolver. Unfortunately, the latter was already completing a pass on complex regular expressions while egrepbuilding its deterministic automation. To win this race, Al Aho circumvented the curse of the exponential growth of the state table of the automaton, inventing a way to build on the fly only those records of the table that are actually visited during recognition.

crabs


Luca Cardelli's charming meta-program for the Blit window system produced virtual crabs that roamed the empty space of the screen, more and more biting off the edges of active windows.

Some general thoughts


Although this is not visible from the outside, the theory and algorithms played a decisive role in the creation of most of these programs: typo, dc, struct, pascal, egrep. In fact, the most surprising thing is the unusual application of the theory.

The original authors of almost half of the list - pascal, struct, parts, eqn - were women, which significantly exceeds the demographic share of women in the field of computer science.

Douglas McIlroy
March, 2020

All Articles