wc to D: 712 characters without a single branch

After reading “ Beat With an 80-Line Program on Haskell,” which I found on Hacker News , I decided that D might be better. And I wrote wc in D.

Note.per. I suggested translate the above article0xd34df00d, but he preferred to make based on his “Winning With Twenty Lines of Haskell: Writing Your Wc” . And now the articles are multiplying like rehashings "minted coin."

Program


Consists of one file - 34 lines and 712 characters.

Source code
import std.stdio : writefln, File;
import std.algorithm : map, fold, splitter;
import std.range : walkLength;
import std.typecons : Yes;
import std.uni : byCodePoint;

struct Line {
	size_t chars;
	size_t words;
}

struct Output {
	size_t lines;
	size_t words;
	size_t chars;
}

Output combine(Output a, Line b) pure nothrow {
	return Output(a.lines + 1, a.words + b.words, a.chars + b.chars);
}

Line toLine(char[] l) pure {
	return Line(l.byCodePoint.walkLength, l.splitter.walkLength);
}

void main(string[] args) {
	auto f = File(args[1]);
	Output o = f
		.byLine(Yes.keepTerminator)
		.map!(l => toLine(l))
		.fold!(combine)(Output(0, 0, 0));

	writefln!"%u %u %u %s"(o.lines, o.words, o.chars, args[1]);
}


Of course, he uses Phobos, the D standard library , but why not? Phobos is beautiful, and comes with every D compiler. The program itself does not contain any if statements. And in the wc implementation on Haskell, several if are used. The D program, in addition to the main one, contains three more tiny functions. I could easily put all the functionality in one range chain, but then it would probably exceed 80 characters per line. This is the basic principle of code style.

Performance


Is wc on D faster than coreutils wc? No, but it took me 15 minutes to write my version (I had to look for walkLength because I forgot the name of the function).
data filelinesbytecoreutilshaskellD
app.d469063.5 ms ± 1.9 ms39.6 ms ± 7.8 ms8.9 ms ± 2.1 ms
big.txt86264k4.7 ms ± 2.0 ms39.6 ms ± 7.8 ms9.8 ms ± 2.1 ms
vbig.txt1.7M96M658.6ms ± 24.5ms226.4 ms ± 29.5 ms1.102 s ± 0.022 s
vbig2.txt12.1M671M4.4 s ± 0.058 s1.1 s ± 0.039 s7.4 s ± 0.085 s

Memory:
data filecoreutilshaskellD
app.d2052K7228K7708K
big.txt2112K7512K7616K
vbig.txt2288K42620K7712K
vbig2.txt2360K50860K7736K
wc on Haskell faster? For large files, certainly, but it uses multithreading. For small files, GNU coreutils still wins. At this stage, my version is most likely limited by IO, and in any case, it is quite fast.

I will not argue that one language is faster than another. If you spend time optimizing the micro benchmark, you will most likely win. But not in real life. But I will argue that functional programming in D almost catches up with the FP in Haskell.

A bit about range in D


range is an abstraction that can be iterated without touching the underlying collection (if any). Technically, range can be a structure or class that implements one or more Range interfaces. The simplest form, InputRange, requires a function

void popFront();

and two members or properties

T front;
bool empty;

T is a generic type of element that range iterates over.

D ranges are specific types. When range falls into the foreach statement, the compiler performs a small modification.

foreach (e; range) { ... }

turns into

for (auto __r = range; !__r.empty; __r.popFront()) {
    auto e = __r.front;
    ...
}

auto e = computes the type and is equivalent to T e =.
With this in mind, it's easy to create a range that can be used by foreach.

struct Iota {
	int front;
	int end;

	@property bool empty() const {
		return this.front == this.end;
	}

	void popFront() {
		++this.front;
	}
}

unittest {
	import std.stdio;
	foreach(it; Iota(0, 10)) {
		writeln(it);
	}
}

Iota is a very simple example of range. It functions as a generator, without having the underlying collection. It iterates through integers from start to finish in increments of one. This snippet reveals a bit of D syntax.

@property bool empty() const {

The @ property attribute allows you to use the empty function in the same way as a member variable (calling a function without parentheses). The const attribute at the end means that we are not modifying the instance data for which we call empty . The built-in unit test prints numbers from 0 to 10.

Another small feature is the lack of an explicit constructor. The Iota structure has two int member variables. In the foreach statement in the test, we create an Iota instance as if it has a constructor that accepts two ints. This is a structural literal. When the compiler D sees this, but the structure does not have a corresponding constructor, then in accordance with the order of declaration of variables - members of the structure will be assigned int.

The relationship between the three members is simple. If empty is false, front is guaranteed to return a new element following the iteration after calling popFront. After calling popFront, the empty value could change. If this is true, it means that there are no more elements to iterate and further calls to front are invalid. According to InputRange documentation :

  • front can be correctly calculated if and only if empty returns or would return false.
  • front can be computed several times without calling popFront or otherwise mutating range or underlying data, and it gives the same result for each calculation.

Using foreach expressions, or loops, is not very functional. Suppose we want to filter out all the odd numbers for Iota. We could put an if inside a foreach block, but that would only make it worse. It would be better if we had a range that takes a range and a predicate that decides whether an element is suitable or not.

struct Filter {
	Iota input;
	bool function(int) predicate;

	this(Iota input, bool function(int) predicate) {
		this.input = input;
		this.predicate = predicate;
		this.testAndIterate();
	}

	void testAndIterate() {
		while(!this.input.empty
				&& !this.predicate(this.input.front))
		{
			this.input.popFront();
		}
	}

	void popFront() {
		this.input.popFront();
		this.testAndIterate();
	}

	@property int front() {
		return this.input.front;
	}

	@property bool empty() const {
		return this.input.empty;
	}
}

bool isEven(int a) {
	return a % 2 == 0;
}

unittest {
	foreach(it; Filter(Iota(0,10), &isEven)) {
		writeln(it);
	}
}

Filter is again very simple: Iota and a function pointer are needed. When constructing the Filter, we call testAndIterate, which takes elements from Iota until it is empty or the predicate returns false. The idea is that the predicate decides what to filter and what to leave. The front and empty properties are simply translated to Iota. The only thing that actually does the job is popFront. It returns the current item and calls testAndIterate. That's all. This is a filter implementation.

Of course, testAndIterate has a while loop, but rewriting it using recursion, in my opinion, is just plain stupid. What makes D great is that you can use the appropriate method for each task. Functional programming is good, and often flaunts, but sometimes it is not. If a little inline assembler is needed or more enjoyable, use.

Calling Filter still doesn't look very good. Assuming you're used to reading from left to right, Filter appears before Iota, even if it runs after Iota. D does not provide a pipe statement, but uses the Unified Function Call Syntax (UFCS). If an expression can be implicitly converted to the first parameter of a function, then the function can be called just as if it were a member function of this expression. This is quite complicated, I understand. An example will help:

string foo(string a) {
	return a ~ "World";
}

unittest {
	string a = foo("Hello ");
	string b = "Hello ".foo();
	assert(a == b);
}

The above example shows two calls to the foo function. As assert implies, both calls are equivalent. What does this mean for our Iota Filter example? UFCS allows us to rewrite the unit test like this:

unittest {
	foreach(it; Iota(1,10).Filter(&isEven)) {
		writeln(it);
	}
}

The map / transform implementation for range should now be accessible to every reader. Of course, a filter can be made more abstract using templates, but these are just details, nothing conceptually new.

Of course, there are different types of range, for example, bidirectional. Guess what opportunities this gives you. A quick tip: there are two new primitives in the bidirectional range called back and popBack. There are other types of range, but after you understand the input range shown above twice, you almost all recognize them.

PS Just for clarity, do not implement your own filter, map or fold; the Phobos D standard library has everything you need. Take a look at std.algorithm and std.range .

about the author


Robert Schadeck earned a master's degree in computer science from the University of Oldenburg. His master's thesis was entitled "DMCD - Distributed Multithreaded D Compiler with Caching", where he worked on creating a D-compiler from scratch. He was a graduate student in computer science in 2012-2018. at the University of Oldenburg. His doctoral dissertation focuses on quorum systems in conjunction with graphs. Since 2018, he has been pleased to use D in his daily work, working for Symmetry Investments.

About Symmetry Investments


Symmetry Investments is a global investment company with offices in Hong Kong, Singapore and London. We have been doing business since 2014 after successfully separating from a major New York hedge fund.

At Symmetry, we strive to reasonably take risks to create value for our customers, partners, and employees. We benefit from our ability to generate win-win contracts - in the broadest sense of the word. Win-Win is our fundamental ethical and strategic principle. By generating win-win options, we can create unique solutions that combine perspectives that are usually perceived as incompatible or opposite, and covering all the best that each of the parties can offer. We are re-integrating fixed income arbitration with global macro strategies. We invent and develop a technology that focuses on the potential of human-machine integration. We create systems in which machines do what they do best, supporting people inso that they do what they do best. We are creating a meritocracy based on collaboration: a culture in which individual contribution serves both personal and collective goals - and is rewarded accordingly. We value both a sense of ownership and a team spirit of cooperation, self-realization and community.

Symmetry Investments people have been active members of Community D since 2014. We sponsored the development of excel-d, dpp, autowrap, libmir and other projects. We launched Symmetry Autumn of Code in 2018 and hosted DConf 2019 in London.

Approx. Since the translator, as usual, was drunk, I cannot vouch for the accuracy of some human terms, and I don’t understand anything in the last section. So send corrections in personal communications.

All Articles