This is usually (though not always) the second most important thing about a piece of software. So what does it mean, in ACRUMEN terms, for a piece of software to be Correct?
This one is pretty much just what it sounds like. Whatever the job of the software is, it should be doing that job correctly, in other words, doing the job right. (Contrast this with Appropriateness, which is about doing the right job.)
So how do we ensure that our software is Correct? Mainly, we test it! We should have enough coverage to be confident in the correctness, from:
- unit tests
- integration tests
- feature tests
- end-to-end tests
But that’s not all. These sorts of tests, usually done as “example-based” tests, only cover the cases that we thought to test. However, there are some more advanced techniques that help find edge cases and other holes in our test suite.
Property-based testing lets you specify some “properties” of your functions, what formal computer scientists would call “invariants”. The tool makes up lots of semi-random test data, based on what types of data you’ve told it your function takes, trying to find cases where the property does not hold. If it finds any, that signifies an edge case you didn’t take into account.
Mutation testing makes a bunch of slightly changed versions of your function, called “mutants”. Then it runs the relevant tests, using the mutants in place of the original function. Each mutant should cause at least one test to fail. If it doesn’t, then there are three main possibilities:
-
The code that got mutated might be what I call “meaningless”. This means that it is unreachable, redundant, or otherwise without any noticeable effect.
-
The test suite might have a hole in it, where there should be a test that would account for the difference in behavior that the mutation made. (If the mutation didn’t make a noticeable difference, see above and below.)
-
Or maybe it’s just a false alarm. The state of the art of mutation testing tools is that many of the mutants that wind up being semantically equivalent to the original code are not used, but it still can’t quite catch them all.
Mutation testing is one of my usual topics for conference speaking.
So, I usually set a rather high bar, at least 80%, for statement coverage with the regular “example-based” kinds of tests. (This may sound difficult, but if you’re doing TDD, you should get nearly 100% quite easily.) Then I often do mutation testing to see if there are holes to plug or meaningless code to eliminate.Â