Coding / Programming Videos

Post your favorite coding videos and share them with others!

How To Stop Worrying And Start Writing Tests Based On Properties

Source link

Recently, references to a certain magic tool, property-based testing, are becoming more and more common. Most of the articles on this topic talk about what a cool approach this is, then they use an elementary example to show how to write such a test using a specific framework, at best, suggest several frequently encountered properties.

Further, the amazed and enthusiastic reader tries to put all this into practice and rests on the fact that the properties are somehow not invented. And unfortunately often gives up on this. In this article, I will try to prioritize a little differently.

I will begin all the same with a more or less specific example, in order to explain what is it exactly. Then I will try to make out some of the problems associated with this approach, and how they can be solved. But then — properties, with examples where they can be applied.

Test the key-value store in three short tests

So, let’s say for some reason we need to implement some kind of key-value storage. It can be a dictionary based on a hash table, or on the basis of a tree, it can be stored entirely in memory, or be able to work with the disk — we do not care. The main thing is that it should have an interface that allows you to:

  • Write value by key,
  • Check if there is an entry with the necessary key,
  • Read value by key,
  • Get a list of recorded items,
  • Get a copy of the vault.

With the classical approach based on examples, a typical test would look something like this:

storage = Storage()
storage['a'] = 42
assert len(storage) == 1
assert 'a' in storage
assert storage['a'] == 42

Or so:

storage = Storage()
storage['a'] = 42
storage['b'] = 73
assert len(storage) == 2
assert 'a' in storage
assert 'b' in storage
assert storage['a'] == 42
assert storage['b'] == 73

And in general, such tests can and will need to write a little more. Moreover, the more complex the internal implementation, the greater the chance of something to miss. In short, a long, tedious and often ungrateful job. How nice it would be to shove her at someone! For example, to force the computer to generate test cases for us. To begin, let’s try to do something like this:

storage = Storage()
key = arbitrary_key()
value = arbitrary_value()
storage[key] = value
assert len(storage) == 1
assert key in storage
assert storage[key] == value

Here is the first test based on the properties. It looks almost the same as the traditional one, although a small bonus is already striking — there are no values ​​taken from the ceiling, instead, we use functions that return arbitrary keys and values.

  • There is another, much more serious advantage — it can be performed many, many times and on different input data to check the contract, that if you try to add an element to empty storage, it will really be added there. Ok, this is all well and good, but so far it looks not very useful compared to the traditional approach.

Let’s try to add another test:

storage = arbitrary_storage()
storage_copy = storage.copy()
assert len(storage) == len(storage_copy)
assert all(storage_copy[key] == storage[key] for key in storage)
assert all(storage[key] == storage_copy[key] for key in storage_copy)

Here, instead of taking an empty repository, we generate an arbitrary one with some data and check that its copy is identical to the original. Yes, the generator must also be written using a potentially buggy public API, but as a rule, it is not such a difficult task. At the same time, if there are any serious bugs in the implementation, then the chances are high that the drops will start in the generation process, so this can also be considered a kind of bonus smoke-test. But now we can be sure that everything that could generate a generator can be correctly copied. And thanks to the first test, we know for sure that a generator can create storage with at least one element. Time for the next test! At the same time reuse the generator:

storage = arbitrary_storage()
backup = storage.copy()
key = arbitrary_key()
value = arbitrary_value()
if key in storage:
storage[key] = value
assert len(storage) == len(backup) + 1
assert key in storage
assert storage[key] == value
assert all(storage[key] == backup[key] for key in backup)

We take arbitrary storage and check that we can add one more element there. So the generator can create storage with two elements. And you can also add an element to it. And so on (I immediately recall such a thing as mathematical induction). As a result, the written three tests and the generator make it possible to reliably verify that an arbitrary number of different elements can be added to the storage. Only three short tests! Here, in general, is the whole idea of ​​tests based on properties:

  • Find properties,
  • Check properties on a bunch of different data,
  • Profit.

By the way, this approach does not contradict the principles of TDD in any way — tests can also be written in the same way before the code. Another thing is that getting such a test to become green can be much more difficult than the traditional one, but when it finally passes successfully, we will be sure that the code actually complies with a certain part of the contract.

This is all well and good, but …

With all the attractiveness of the property-based testing approach, there is a whole bunch of problems. In this part, I will try to make out the most common. And apart from the problem with the actual difficulty of finding useful properties, in my opinion, the biggest trouble for beginners is often false confidence in good coverage.

Indeed, we have written several tests that generate hundreds of test cases — what can go wrong? If you look at the example from the previous part, then, in fact, a lot of things. For a start, the written tests do not give any guarantee that storage.copy () will actually make a “deep” copy, and not just copies the pointer. Another hole is that there is no normal verification that key in storage will return False if the key you are looking for is not in the storage. And the list can still be continued. Well, one of my favorite examples is, let’s say, we write sorting, and for some reason, we consider that one test that checks the order of elements is enough:

input = arbitrary_list()
output = sort(input)
assert all(a <= b for a, b in zip(output, output[1:]))

And such an implementation of his excellent pass:

def sort(input):
return [1, 2, 3]

The next problem, which in some sense can be called a consequence of the previous two — using property-based testing is often very difficult to achieve truly complete coverage. But in my opinion, this is solved very simply — you do not need to write only tests based on properties, nobody has canceled traditional tests.

  • In addition, people are so arranged that it is much easier for them to understand things with concrete examples, which also speaks in favor of using both approaches. In general, I developed for myself something like the following algorithm — to write some very simple traditional tests, ideally, so that they could serve as an example of how you intend to use the API. As soon as there appeared a feeling that “for documentation” of tests is enough, but it’s still far from complete coverage to start adding tests based on properties.

Now to the question of frameworks, what to expect from them and why they are needed at all — after all, no one forbids them to run a test in a cycle, calling inside with random and enjoying life. In fact, the joy will be before the first drop of the test, and it is good if it is local, and not in some CI. First of all, since tests based on properties are randomized, we need a way to reliably reproduce the fallen case, and any self-respecting framework allows it.

The most popular approaches are to output a certain seed to the console, which you can manually slip into the test runner and reliably reproduce the fallen case (convenient for debugging), or create a cache on the disk with “bad” seeds that will be automatically checked first ( helps with repeatability in CI). Another important aspect is data minification (shrinking in foreign sources). Since the data is generated randomly, there is a completely non-illusory chance to get on a failing test case with a container of 1000 items, which is to debug something else “fun”. Therefore, good frameworks, after finding a felting case, use a number of heuristics to try to find a more compact set of input data, which nevertheless will continue to fail the test. Finally, often half the functionality of the test is an input data generator, so the presence of built-in generators and primitives that allow you to quickly assemble more complex simple generators also greatly help.

Also, there is a periodical criticism that tests based on properties are too much logic. However, this is usually accompanied by examples in the style.

data = totally_arbitrary_data()
perform_actions(sut, data)
if is_category_a(data):
assert property_a_holds(sut)
else if is is_category_b(data):
assert property_b_holds(sut)

In fact, this is a fairly frequent anti-pattern (among newbies), don’t do that! It is much better to break such a test into two different ones, and either skip unsuitable input data (in many frameworks there are even special means for this) if there is a small chance to get into them, or use more specialized generators that will immediately issue only suitable data. The result should be something like this:

data = totally_arbitrary_data()
perform_actions(sut, data)
assert property_a_holds(sut)


data = data_from_category_b()
perform_actions(sut, data)
assert property_b_holds(sut)

Useful properties, and their habitats

Okay, how useful testing based on properties seems to be clear, the main pitfalls have been disassembled … although not, the main thing is just unclear — where do you get these very properties from? Let’s try to search.

At least do not fall

The simplest option is to push arbitrary data into the system under test and check that it does not fall. In fact, this is a whole separate direction with the fashionable name fuzzing, for which specialized tools exist (for example, AFL aka American Fuzzy Lop), but with some stretch this can be considered a special case of property-based testing, and if there are no ideas at all Do not climb, then you can start with it. However, as a rule, such tests rarely make sense, since potential failures are usually very good at testing other properties as well. The main reasons why I mention this “property” are to direct the reader to the fuzzes and, in particular, AFL (there are a lot of English-language articles on this topic), well, for completeness.

Test oracle

One of the most boring properties, but in fact a very powerful thing that can be used much more often than it might seem. The idea is that sometimes there are two pieces of code that do the same thing but in different ways. And then you can especially without understanding generating arbitrary input data, shove them in both variants and check that the results are the same. The most frequently cited example of application is to leave a slow but simple option when writing an optimized version of a function and drive tests against it.

input = arbitrary_list()
assert quick_sort(input) == bubble_sort(input)

However, the applicability of this property is not limited to this. For example, it often turns out that the functionality implemented by the system we want to test is a superset of something already implemented, often even in the standard language library. In particular, usually, most of the functionality of some key-value storage (in memory or on disk, based on trees, hash tables or some more exotic data structures such as merkle patricia tree) can be tested with a standard dictionary. Testing any CRUDs — there too.

Another interesting application that I used personally — sometimes when implementing a numerical model of a system, some special cases can be calculated analytically, and the results of the simulation can be compared with them. In this case, as a rule, if you try to push arbitrary data to the input, even with the correct implementation, the tests will still fall due to the limited accuracy (and therefore applicability) of the numerical solutions, but during the repair process by imposing restrictions on the generated input data, these very limitations become known.

Requirements and Invariants

The basic idea here is that often the requirements themselves are formulated in such a way that they are easy to use as properties. In some articles, invariants are singled out separately on such topics, but in my opinion, the border is too fragile here, since most of these invariants are direct consequences of the requirements, so perhaps I’ll dump everything.

A small list of examples from a variety of areas that are convenient for checking properties:

  • A class field must have a previously assigned value (setters),
  • The repository should be able to read the previously recorded item,
  • The adding a previously non-existing item to the repository does not affect previously added items,
  • Many dictionaries can not store several different items with the same key,
  • The base64 encoding result must contain only characters from the base64 alphabet,
  • The route building algorithm must return a sequence of allowable movements that will lead from point A to point B,
  • For all points of the contour constructed should be performed f(x, y) = const,
  • The signature verification algorithm should return True if the signature is real and False otherwise,
  • As a result of orthonormalization, all vectors in the basis must have unit length and zero reciprocal scalar products,
  • The vector transfer and rotation operations should not change its length

In principle, one could say that everything is finished, use test oracles, or look for properties in the requirements, but there are several interesting “special cases” that I would like to mention separately.

Stateful testing and state testing

Sometimes you need to test something that has a state.

In this case, the easiest way:

  • Write a test that checks the correctness of the initial state (for example, that the container just created is empty),
  • Write a generator that, using a set of random operations, will bring the system into some arbitrary state,
  • Write tests for all operations using the result of the generator as the initial state.

Very similar to mathematical induction:

  • To prove claim one,
  • To prove the assertion N + 1, assuming that the assertion N is true.

Another method (sometimes giving a little more information about where it broke) is to generate a valid sequence of events, apply it to the system under test and check the properties after each step.

Back and forth

If suddenly there is a need to test a couple of functions for the direct and inverse transformation of some data, then consider yourself lucky:

input = arbitrary_data()
assert decode(encode(input)) == input

Great for testing:

  • Serialization-deserialization
  • Encrypt-decrypt
  • Encoding decode
  • Transformations of the basic matrix into quaternion and back
  • The direct and inverse coordinate transformation
  • A direct and inverse Fourier transform

A private, but the interesting case is the inversion:

input = arbitrary_data()
assert invert(invert(input)) == input

A striking example is the inversion or transposition of a matrix.


Some operations do not change the result when reapplied. Typical examples:

  • Sorting,
  • All sorts of vectors and bases,
  • Re-add an existing item to the set or dictionary,
  • Re-write the same data in some property of the object,
  • Casting data to canonical form (spaces in JSON lead to a single style for example).

You can also use idempotency to test serialization-deserialization if the usual decode (encode (input)) == input method is not suitable because of the different possible representations for equivalent input data (again, there are extra spaces in some JSON):

def normalize(input):
return decode(encode(input))
input = arbitrary_data()
assert normalize(normalize(input)) == normalize(input)

Different paths, one result

Here the idea boils down to exploiting the fact that sometimes there are several ways to do the same thing. It may seem that this is a special case of a test oracle, but in fact, it is not quite so. The simplest example is the use of commutativity of some operations:

a = arbitrary_value()
b = arbitrary_value()
assert a + b == b + a

It may seem trivial, but it’s a great way to test:

  • Addition and multiplication of numbers in a nonstandard representation (bigint, rational, that’s all),
  • “Adding” points on elliptic curves in finite fields (hello, cryptography!),
  • A union of sets (which can have completely non-trivial data structures inside)

In addition, adding the elements to the dictionary has the same property:

A = dict()
A[key_a] = value_a
A[key_b] = value_b
B = dict()
B[key_b] = value_b
B[key_a] = value_a
assert A == B

The variant is more complicated — I thought for a long time how to describe in words, but only a mathematical notation comes to mind. In general, such f (x) transformations are often found, for which the property f (x + y) = f (x) ⋅ f (y) holds, and both the argument and the result of the function are not necessarily a number, but + and ⋅ are just some binary operations on these objects. What can this test:

  • Addition and multiplication of any strange numbers, vectors, matrices, quaternions (a cdot (x+y) = a cdot x + a cdot y)
  • Linear operators, in particular, all integrals, differentials, convolutions, digital filters, Fourier transforms, etc ()
  • Operations on the same objects in different representations,

For example:

M(q_a cdot q_b) = M(q_a) cdot M(q_b)
  • these are unit quaternions, and M (q) is the operation of converting a quaternion into an equivalent basis matrix,
F[a circ b] = F[a] cdot F[b]
  • these are signals, — convolution, — multiplication, and — Fourier transform.

An example of a slightly more “routine” task is to test some tricky dictionary merge algorithm in some way:

a = arbitrary_list_of_kv_pairs()
b = arbitrary_list_of_kv_pairs()
result = as_dict(a)
assert result == as_dict(a + b)


That’s basically all that I wanted to tell in this article. I hope it was interesting, and a little more people will begin to put all this into practice. To make the task a little easier, I will give a list of frameworks of different degrees of fitness for different languages:

And, of course, a special thanks to people who once wrote wonderful articles, thanks to which I learned about this approach a couple of years ago, stopped worrying and started writing tests based on properties:

Source link


Leave a Reply

Please Login to comment
Notify of
Translate »