Expressive software tests

What are expressive software tests ?

In one of his recent articles on the topic of kinds of software tests and the shapes used to describe them, Martin Fowler argues that there is no real convention for categorizing tests:

The take-away here is when anyone starts talking about various testing categories, dig deeper on what they mean by their words, as they probably don’t use them the same way as the last person you read did.

And he goes on by quoting Justin Searls that:

[…] Nearly zero teams write expressive tests that establish clear boundaries, run quickly & reliably, and only fail for useful reasons. Focus on that instead.

So, what exactly is an expressive tests ? I would define an expressive test like this: the purpose of the test, its steps, pre conditions and post conditions are obvious even to someone who hasn’t written it. If you can’t think how an expressive would look like right now, you might find it easier to think about a non-expressive test. Smells of non expressive tests include:

Purpose of the test not obvious (Missing Clarity)
Lengthy, hard to read code (Low Comprehensibility)
Heavy Mock and / or simulator usage (High coupling, low cohesion, Wide scope)
No clearly defined pre conditions (Missing Clarity)
Lack of and non-obvious assertions (Completeness)
Hard to change, extend (Extensibility, Maintainability)

I put the software quality attributes associated with the smells in parenthesis. Although writing expressive tests may not solve all these quality short comings entirely, it may nevertheless improve some of them.

Looking at some examples of expressiveness

In order to compare tests with regards to their expressiveness I coded some example in Ggo , Python and Groovy as part of my code katas . The test target was a prime numbers calculation function.
The first example (accessible here ) is a simple unit test in go using the integrated testing package :

func TestCalculatePrimesTo10(t *testing.T) {

	expectedPrimes := []uint64{2, 3, 5, 7}
	calculatedPrimes := primenumbers.CalculatePrimes(1, 10)

	if !reflect.DeepEqual(expectedPrimes, calculatedPrimes) {
		t.Fatalf("expected: %v, got: %v", expectedPrimes, calculatedPrimes)
	}
}

Let’s examine the code with regards to its expressiveness. The meaningful function and variable names make it obvious what is tested. Pre and post conditions and the tests step(s) are comprehensible although not structurally separated. You need to be a little bit proficient in coding (maybe not in golang but generally) in order to deduct what the method invocation of reflect.DeepEqual does. A for loop might have made that more obvious.
Let’s compare this to the same unit test i wrote in python:

class CalculatePrimesTests(unittest.TestCase):

    def test_calculate_primes_to_10(self):
        expected_primes = [2, 3, 5, 7]
        calculated_primes = calculate_primes(start=1, stop=10)
        self.assertEqual(calculated_primes, expected_primes)

Whether using Snake case naming is more expressive than Camel case remains a question but the beauty of python’s Gradual typing and the build in function assertEqual make the more concise and a little bit easier to understand. But overall both tests are equal with regards to their expressiveness because of meaningful names and easy to read code.
I could go on writing similar tests in Java and other language. The key points here are that they are tied to the programming language and therefore the tests expressiveness depends on the expressiveness of the programming language itself and how they are programmed (e.g naming convention).
The next chapter looks at how to add more natural language to test. I will also look at a combination of natural and programming language using some syntactic sugar offered by Groovy.

More natural language for more expressiveness

When it comes to expressing tests in a natural language, BDD with Gherkin is the most prominent choice.
BDD frameworks like godog for golang or behave for python let you write test specifications in a natural language while the glue code is written in a programming language. Here is the prime numbers test written in Gherkin:

Feature: Calculate prime numbers

  @calculate-primes
  Scenario Outline: Calculate primes
    When I calculate the prime numbers between <start> and <end>
    Then the calculated prime numbers should be <calculated-primes>
    Examples: Valid inputs
      | start | end  | calculated-primes |
      | 0     | 10   | 2, 3 ,5, 7        |

That test specification is easy to read even by people which are not that familiar with programming. The only particularity is that I used a Scenario outline or template whose data is taken from the table titled Examples.
The code that connects the steps of the test specification is written in go and requires at least some programming skills. The complete code of the glue code can be found here .
So while BDD tests are very expressive, the glue code may require in-depth programming knowledge.
Last but not least i would like to show another example using a hybrid approach with Groovy and the Spock framework :

def "calculate prime numbers between start and stop"(int start, int stop) {
    expect:
    expectedPrimes == new primenumbers().calculatePrimes(start, stop)

    where:
    start | stop || expectedPrimes
    0     | 10   || [2 ,3 ,5 ,7]
    0     | 100  || [2, 3, 5, 7, 11, 13, 17, 19, ..., 73, 79, 83, 89, 97]
}

This syntax has a bit of everything: Natural language for the test specification, programming parts in the expect block, and a table that contains the test data. The syntactic sugar are the blocks as they to clearly structure the test. Not only is the tests specification expressive but failure reports are as well, as the following example shows:

Condition not satisfied:

expectedPrimes == new primenumbers().calculatePrimes(start, stop)
|              |  |                  |               |      |
[1, 3, 5, 7]   |  |                  [2, 3, 5, 7]    0      10
               |  <primenumbers.primenumbers@70c534cb binding=groovy.lang...>
               false

It’s easy to spot errors without having to program meaningful test error messages.
While we have looked at the expressiveness of tests it’s time to remind ourself that there are other attributes that matter when it comes to tests.

The need for speed

Coming back to Justin’s quote, I found that one of the most important attributes of tests is speed. Quickly running tests aid an incremental development so that small changes can be validated conveniently. Giving us the benefit of a timely feedback.
During my exercises i found that go even with the BDD framework godog runs very fast. So using BDD even for unit tests is possible.
Python is only marginally slower. Depending on the amount of tests this speed disadvantage might matter though.
Groovy, Spock and gradle on the other hand take more time to initialize. Once initialized test also ran rather quickly.

Different languages for implementation and tests ?

Following the idea of a Polyglot Developer that

“the practice of writing code in multiple languages to capture additional functionality and efficiency not available in a single language”

is beneficial for anyone developing software the question is when it does make sense to use a different language for implementation and tests ? The answer is that it depends on the type of application, its interfaces and the similarity of the languages.
Testing java applications with Groovy and Spock can be done seemingly and is a common technique. Groovy offers language concepts such as being a dynamic, having optional types and meta-programming which aid writing expressive tests and allow even Non-Java-Gurus to write tests.
If the application to test offers implementation independent interfaces, like a REST interface or a command line interface choosing a more suitable language for testing can increase overall test quality. Although a different languages do not seem suitable for unit. Mostly because of the reduced interoperability.

Conclusion

When it comes to writing expressive tests, BDD seems the best option. GoLang and Python offer fast frameworks for using it. Closely in pursuit is Groovy and Spock. Although not having test specifications entirely written in natural language, the syntactic sugar of Groovy is a very good compromise.
Using a different language for implementation and tests may be a good idea for higher level tests than unit tests, if the benefits outweigh the additional effort.