Property-based testing for JavaScript and UI: an unusual approach to automated tests

Elon Musk's Tesla Roadster
Falcon Heavy Demo Mission

Writing tests is boring. And what is boring to do is constantly delayed. My name is Nazim Gafarov, I am an interface developer at Mail.ru Cloud Solutions , and in this article I will show you a different, slightly strange approach to automated testing.

What is wrong with conventional testing and what to do


So, imagine that you have a summing function like this:

function sum (a, b) {
   return a + b
}

We all understand the importance of unit tests. Let's write a test for this function:

const {equal} = require('assert')

const actual = sum(1, 2)
const expected = 3

equal(actual, expected)

We pass 1 and 2 to the input, we expect 3 at the output. Everything is simple - this is a classic unit-testing based on examples, the so-called example-based testing. The test works, everyone is happy, you can roll to the prod. But then your colleague, a fabulous enterprise programmer, comes into play. Once he needed your summation function, but for some reason he decided to tweak it a bit:

function sum (a, b) {
 return 3
}

There is some problem in this code, but on the other hand, all tests pass, and TDD teaches us that we need to write minimal code that will make your tests pass. This is true. Overcoming your teenage anger, you write another test - pass 4 and 8, expect 12:

equal(
 sum(4, 8),
 12
)

But the enterprise programmer does not calm down. It once again corrects the summation function so that the tests do not fail:

function sum (a, b) {
 if (a == 4 && b == 8) return 12
 return 3
}

You could add more examples to the test suite, and this would go on ad infinitum. At this moment you think: “Why did they only hire him?”, But nowhere to go. You release your secret weapon - randomly:

const a = Math.random()
const b = Math.random()
const actual = sum(a, b)
const expected = a + b

equal(actual, expected)

Now everything is fine, but the huge problem with such a test is that you duplicate the implementation of the function in the test code. That is, use one implementation to test another:

  • Now you have two implementations of the same function that you need to keep up to date;
  • obviously, the summation function is pretty primitive, but imagine if your code does something more complicated than summation.

In addition, when we test the code this way, we understand that this is some cunning, because we tested it on only two pairs of input data.

equal( sum(1, 2), 3 )
equal( sum(4, 8), 12 )

This test shows that the code only works correctly in these two cases. The best of us realize that it’s good to test boundary cases like negative numbers, floating point numbers, and others. But this still reflects the bias of the developer.

Again, there is a problem with the damned Enterprise Programmer (The Enterprise Developer From Hell). This term was introduced by Scott Vlashin , a well-known popularizer of F #. You might think that an enterprise programmer is unrealistic. It is clear that in a healthy company, not a single normal person will break functions, but in many cases we ourselves act in this way.

We write functions much more complicated than A + B, and in the process of implementation we can write code that works in particular particular cases, and not in general. This is not due to malice, but unintentionally, due to unconsciousness and blindness.

So what can we do with this. Let's think.

A + B

It makes no sense to tie tests for A or B, you need to test what is in the middle, the plus sign itself. That is, you need to write a test that focuses not on input and output, but on properties. These properties must be true for any proper implementation. Therefore, let's think about what properties summation has.

Commutativity


According to the school curriculum, we know this property: "the amount does not change from a change in the places of the terms." That is, addition has the property of shifting - commutativity. Let's write a test that will verify that our implementation matches this property:

const actual = sum(1, 2)
const expected = sum(2, 1)

equal(actual, expected)

The good thing about this test is that it works with any input, not just special magic numbers. Nothing prevents us from doing something like this:

const rand = Math.random
const [n1, n2] = [ rand(), rand() ]

const actual = sum( n1, n2 )
const expected = sum( n2, n1 )

equal(actual, expected)

The addition of two numbers is not something we do every day, but a similar approach can be used not only for mathematical operations, but also for testing real web services, databases and even interfaces. Therefore, the following example with the division of two numbers:

function div (dividend, divisor) {
 return dividend / divisor
}

We go to Wikipedia, and it turns out that division has the distributive property on the right . It means that dividing the sum of two numbers by some divisor is the same as dividing them separately. Great, let's test this:

const [n1, n2, n3] = [rand(), rand(), rand()]

const left = div(n1 + n2, n3)
const right = div(n1, n3) + div(n2, n3)

equal(left, right)

Now we run this test in a loop many, many times and with due patience we get the following combination of input data:

const [n1, n2, n3] = [0, 0, 0]

And the test fails, because dividing zero by zero gives NaN:

assert.js:85
 throw new AssertionError(obj);
 ^

AssertionError [ERR_ASSERTION]: NaN == NaN

And NaN, as you know, is not equal to NaN. This is normal JavaScript behavior, but now we understand that in our division function, we need to check for zeros. We spin our test in a cycle further, each time generating a new portion of random data. At some point, we get the following combination:

const [n1, n2, n3] = [2, 1, -347]

And the test falls again:

assert.js:85
 throw new AssertionError(obj);
 ^

AssertionError [ERR_ASSERTION]:
-0.008645533141210375 == -0.008645533141210374

This is the inaccuracy of rounding floating point numbers. That is, again, the normal limitations of the calculations, but when we generate the input data, these restrictions become explicit. Now, in our function, we need to think about explicit rounding or an algorithm that will reduce the computational error, for example, the Kakhan Grandma algorithm .

It would seem that we wrote a regular test, but there are no magic values ​​taken from our imagination. We use arbitrary values ​​and get the opportunity to run the test many, many times on different input data, checking the specification itself, that is, what the function should do, and not its behavior with single examples.

This is property-based testing. That is a combination of the following things:

  1. First, we describe the input data - we tell the system what random data needs to be generated.
  2. Then we describe the expected properties - some conditions for passing the test.
  3. And then just run this test many, many times.

How to identify properties?


To identify a property for your specific function, you first need to formulate requirements. With such testing, you don’t even need to generate random data. You can check the properties on specific examples, for example, on very different values, some boundary cases.

But first of all, we want to make sure that our function is identical on a given set . That is, it works correctly on all valid values.

(xX)P(x)

Of course, no one forbids driving a test in a cycle by manually substituting these X's, but we definitely need a reliable way to reproduce fallen tests. Fortunately, there are ready-made frameworks for this.

Frameworks


As you know, all the best in programming was originally invented in the Haskell world. 20 years ago, the idea of ​​property testing was implemented in the QuickCheck framework .

Now this form of testing in the Haskell ecosystem is actually dominant. There are several libraries for JavaScript, but I will focus on two: JSVerify and fast-check .

const jsc = require('jsverify')

jsc.assertForall(
 jsc.integer, jsc.integer,
 (a, b) => a + b === b + a
)

This is a simple test for the translational property of addition, which we talked about at the beginning. Since we have an untyped language, we need to somehow tell the framework what arguments we expect. Here we say that we need two numbers and with the last argument we pass the predicate. By default, JSVerify will run the test a hundred times, each time generating a new pair of input values.

Let's check the relocation property of subtraction. Of course, subtraction does not have such a property, so we get an object with an error and conclude it:

const subtractionIsCommutative = jsc.checkForall(
 jsc.integer, jsc.integer,
 (a, b) => a - b === b - a
)

console.log(subtractionIsCommutative)

{
 counterexample: [ 0, 1 ],
 tests: 1,
 shrinks: 4,
 rngState: '0e168f30eac572b94d'
}

The system says that it fell after the very first test on counterexamples 0 and 1. RngState is the state of the random number generator. In this case, the test data is randomly determined. Random number generator displays a seed for us, which can be slipped into the test runner to reproduce the fallen case. It is convenient for debugging, helps with reproducibility in CI / CD.

mocha test.js --jsverifyRngState 0e168f30eac572b94d

JSVerify has a small DSL for types, which allows you to slightly reduce the record. Sometimes this is convenient, for example, when custom types are needed, it is easier to write like this:

jsc.assert(jsc.forall(
 '{ name: asciinestring; age: nat }',
 (obj) => {
     console.log(obj) // { name: '9lfpy', age: 34 }
     return true
 }
))

Than so:

jsc.record({
 name: jsc.asciinestring,
 age: jsc.nat,
})

Choose your preferred method. So we can generate any of our own types, for example objects that come to us from the backend. If there aren’t enough built-in generators, you can easily write your own. Let's say you need not just a string, but a string with an email address. You can generate it this way:

const emailGenerator = jsc
 .asciinestring.generator
 .map(str => `${str}@example.com`)

In the real life


Now let's see how this can be applied in real life. The query-string library has six million downloads per week. This package is listed in the dependencies of more than four thousand other packages.

Query-string does one simple thing - it parses a URL string into an object and, conversely, can generate a URL or part of it from an object:

queryString.parseUrl('https://foo.bar?foo=bar')
//=> {url: 'https://foo.bar', query: {foo: 'bar'}}

queryString.stringify({b: 1, c: 2, a: 3})
//=> 'b=1&c=2&a=3'

Naturally, this library is covered in a bunch of classic example-based tests. A total of 400 lines of test code .

But you cannot take into account all the options, no matter how many tests are written. Instead of inventing new examples, the author of the fast-check library wrote one single test focused on the properties of the library:

fastCheck.property(
 queryParamsArbitrary, optionsArbitrary,
 (object, options) => deepEqual(
   queryString.parse(queryString.stringify(object, options), options),
   object
 )
)

Query-string is a classic inversion, that is, any object must be converted to a query-string, and if this string is parse, then the original object should be obtained.

As you know, he immediately caught a bug .

He used the same approach to test the infamous left-pad library and discovered a bug with strings that contained characters outside the Unicode main plane , for example emoji.

But how did he manage to easily deduce properties that find bugs? All because he is familiar with popular properties. Let's look at them too.

Inversion


The approach is also known as Bilbo testing in honor of Tolkien’s story “There and Back.”

Let's say we have an encryption function. If decryption is applied to the result, we should get the initial message:

const string = 'ANY_STRING'
const encrypted = encrypt(string)

expect( decrypt(encrypted) ).toBe( string )

It doesn’t matter which message we have encrypted — it can be any line. Accordingly, we can generate it. The same can be used for serialization-deserialization, encoding-decoding, lossless compression, and so on.

This is the property we saw when testing query-string:

const obj = {any: 'object'}

_.isEqual(
   JSON.parse( JSON.stringify(obj) ),
   obj,
)

Write / read, insert / search also match this pattern, even if they are not strict inversions.

Reversibility


Also a special case of inversion is a round-trip. This is when we take a reversible function and apply it twice:

_.isEqual(
 [...array].reverse().reverse(),
 array,
)

That is, if our function flips the array, we can generate an array with any data inside, flip it twice and the resulting result should match the original array.

Invariance


A search for invariants is a search for something that does not change when a function is applied. Let's say we have a sort function. If you apply sorting to any array, the length of this array should not change:

equal(
 [...array].sort().length,
 array.length,
)

In addition, the elements of the array should not be changed either: they can change their order, but they themselves do not change - each element of the original array must be found in the sorted array.

Size, length, content are excellent features to verify that your function does not change what it should not change. Typically, such a check is not sufficient in itself, but often act as a counter-check to other properties.

Idempotency


For example, to the idempotency property. When you reapply a function, this property will give the same result as when you first used it. For example, if we sorted an array once, re-sorting will not change anything:

_.isEqual(
 [...array].sort().sort(),
 array.sort(),
)

string.padStart(10) === string.padStart(10).padStart(10)

Formatting, searching for unique values, normalizing, adding an existing element to a set - all these operations should not change anything when re-applied.

Hard to prove, easy to check.


This property is usually illustrated by the example of a maze. Finding a way out of the maze is difficult, but if your function knows how to find a way out, it is easy to test it - you just need to follow its instructions. As a result, it should lead us to the exit. We do not know the shortest path or not, but at least we are sure that this is the right path that leads to the exit.

Another example is again the sorting function. Sorting an array correctly is quite difficult. But we can easily check the result of sorting - we need to sequentially take the elements of the array and compare them with the subsequent ones. If we sort in ascending order, then the current element should always be less than the next (or equal to it).

Reference implementation


This approach is also called a test oracle. Let's say we have two functions that do the same thing.

For example, it occurred to us to implement our own sorting function. At the same time, we have a sorting function built into JavaScript, which we take as a reference. The idea of ​​the test oracle is to compare the results of our sort with the results of the reference.

Thus, we can generate random data, feed it into both functions and verify that the results match:

_.isEqual(
 [...array].sort(),
 fastestSortingAlgorithm(array),
)

This is a good template if you refactor old code. You can make sure that the new implementation produces the same results as its old counterpart. The test oracle in this case checks the correspondence property of the reference implementation.

Just don't fall


In this case, the property is that the function at least does not fall. This type of testing is called fuzzing, when we transfer incorrect or random data to the application input and see if the system crashes, whether there are freezes, violations of internal logic, or unexpected behavior.

Suppose we have an API - it doesn’t matter what pens we pull, what data we transfer, in any case, the server should not respond 500. This property itself makes little sense, but as a starting point it will do.

UI Testing


Imagine that you have an online store with a basket of goods, you need to test it. First you need to determine the available actions:

  • we can add goods to the basket;
  • remove goods;
  • Empty trash.

Now let's reveal the properties. Offhand, you can say that the quantity of goods cannot be a negative quantity: Basket> = 0 . At the same time, there can be no more goods in the basket than in the catalog: Basket <= Catalog . And the sum of the entire basket cannot be less than the price of the most expensive product in it: sum (Basket)> = max (Goods) .

Then, in a cycle, you generate a valid sequence of actions many and many times and check all the properties after each step. Relatively speaking, in a random order you add something to the basket many, many times, delete something, empty it, and so on. And after each step, check the truth of all the properties.

With such testing, you can go through the user interface and click on buttons as in regular e2e tests. But if you have a framework at the front where the presentation is a pure function of the model, then you can simply manipulate the model.

This approach was taken by Spotify developers to test the playlist .

Pros and Cons of Property Based Testing


pros


  1. Property-based tests replace many example-based tests, that is, you write less code, and you get a lot more tests.
  2. Such tests can themselves find extreme cases that you might not have thought about: division by zero, lines with emoji and the like.
  3. , — - , . , .
  4. , , , .
  5. . -, , .


  1. . .
  2. , . Property-based , , .
  3. Each test needs to be run a hundred times, so the test execution time increases slightly.
  4. Such tests give a false sense of security. Suppose you have identified several properties of a function and this gives you confidence that the implementation is correct. However, a property may be necessary but not sufficient. For example, a multiplication function has the property of being movable in the same way as summation, but these functions do slightly different things.

findings


We should not refuse classical tests, but we can combine them with testing on the basis of properties.

For example, you can cover the basic functionality with classic tests based on examples, and critical functions can be additionally covered with property tests.

PS


This is a text version of the report from HolyJS Piter 2019 and Panda Meetup # 22 .



What else to read :


All Articles