Discussion: standard UNIX utilities that few have used and are currently using

A week ago, Douglas McIlroy, a developer of the UNIX pipeline and author of the concept of “component-oriented programming,” spoke about interesting and unusual UNIX programs that were not widely used. The publication has launched an active discussion on Hacker News. We have collected the most interesting and will be happy if you join the discussion.


Photos - Virginia Johnson - Unsplash

Work with text


On UNIX-like operating systems, there is a standard set of tools for formatting text. The typo utility allowed you to view a document for typos and hapaks - words that appear in the material only once. Interestingly, the program does not use dictionaries to search for typos . It relies only on the information in the file and carries out a frequency analysis of trigrams (a sequence of three characters). In this case, all the necessary counters are stored in a 26x26x26 array. According to Douglas McIlroy, this amount of memory was barely enough for several single-byte counters. Therefore, in order to save them, they were written in a logarithmic form.

Today, typo has been replaced by more modern and accurate dictionary-based spell checkers. However, the tool is still remembered - a few years ago, an enthusiast introduced the typo implementation on Go. The repository is still being updated.

Another 80s document tool is the Writer's Workbench from Lorinda Cherry and Nina McDonald from Bell Labs. It included tools for determining parts of speech and document style, searching for tautologies and overly complex sentences. Utilities were developed as an aid to students, and at one time they were usedstudents at Colorado State University in the USA. But by the beginning of the nineties, the Writer's Workbench had been forgotten because it was not included in Version 7 Unix. However, this tool continued the path of imitators - for example, Grammatik for IBM PC.

UNIX also has standard tools for simplifying work with formulas. There is a language preprocessor for processing mathematical expressions eqn . It is noteworthy that in order to display a formula, it is enough for the developer to describe it in simple words and symbols. Keywords allow you to shift the mathematical signs vertically and horizontally, change their sizes and other parameters. If you pass the line to the utility:

sum from { k = 1 } to N { k sup 2 }

The following formula will be generated at the output:

k=1Nk2


In the 1980s and 1990s, eqn helped IT professionals write software manuals. But later it was replaced by the LaTeX system, which even Habr uses . But eqn is the first tool of this class, remaining part of UNIX-like OS.

Work with files


In the thematic thread, residents of Hacker News noted several rarely used utilities for working with files. One of them was comm to compare them. This is a simplified analogue of diff , sharpened for working in scripts. It was written by Richard Stallman himself with David MacKenzie.

The output of the program consists of three columns. The first column contains values ​​that are unique to the first file, the second - unique to the second file. The third column includes general values. For comm to work correctly, documents to be compared must be lexically sorted. Therefore, one of the residents of the site suggested working with the utility in the following form:

comm <(sort fileA.txt) <(sort fileB.txt)

Comm is useful for verifying spelling of words. It is enough to compare them with the reference dictionary document. Given the subtleties associated with the need to sort files, it is believed that Stallman and Mackenzie wrote their utility exclusively for this user case.


Photo - Marnix Hogendoorn - Unsplash

Also, a discussion participant on HN noted the capabilities of the paste operator , which were not obvious to him. It allows you to alternate data streams or split one stream into two columns during output:

$ paste <( echo -e 'foo\nbar' ) <( echo -e 'baz\nqux' )
foo     baz
bar     qux
$ echo -e 'foo\nbar\nbaz\nqux' | paste - -
foo     bar
baz     qux

One user noted that often these are not the most optimal solutions to use: starting with fmt , ex and ending with mlr with jot and rs .

What standard features of UNIX-like operating systems were your discovery?

What we write about in our corporate blog:

How the Domain Name System Developed: ARPANET Era
Domain Name System History: First DNS Servers
DNS History: When Domain Names became Paid
Domain Name System History: “War” Protocols

All Articles