Pit testing is useful. It basically tests how effective your tests are and tells you missed conditions that aren’t being tested. For Java. https://pitest.org
Programming
Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!
Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.
Hope you enjoy the instance!
Rules
Rules
- Follow the programming.dev instance rules
- Keep content related to programming in some way
- If you're posting long videos try to add in some form of tldr for those who don't want to watch videos
Wormhole
Follow the wormhole through a path of communities !webdev@programming.dev
The most extreme examples of the problem are tests with no assertions. Fortunately these are uncommon in most code bases.
Every enterprise I’ve consulted for that had code coverage requirements was full of elaborate mock-heavy tests with a single Assert.NotNull at the end. Basically just testing that you wrote the right mocks!
That’s exactly the sort of shit tests mutation testing is designed to address. Believe me it sucks when sonar requires 90% pit test pass rate. Sometimes the tests can get extremely elaborate. Which should be a red flag for design (not necessarily bad code).
Anyway I love what pit testing does. I hate being required to do it, but it’s a good thing.
Yeah. All the same. Create lazy metric - get lazy and useless results.
This is really interesting, I've never heard of such an approach before; clearly I need to spend more time reading up on testing methodologies. Thank you!
I'd never heard of mutation testing before either, and it seems really interesting. It reminds me of fuzzing, except for the code instead of the input. Maybe a little impractical for some codebases with long build times though. Still, I'll have to give it a try for a future project. It looks like there's several tools for mutation testing C/C++.
The most useful tests I write are generally regression tests. Every time I find a bug, I'll replicate it in a test case, then fix the bug. I think this is just basic Test-Driven-Development practice, but it's very useful to verify that your tests actually fail when they should. Mutation/Pit testing seems like it addresses that nicely.
We are running the above pi tests with an extra (Gradle based) build plugin so that it only runs mutations for the changed lines in that pull request. That drastically reduces runtime and still ensures that new code is covered to the mutation test level we want. Maybe something similar can be done for C or C++ projects.
But is there any accepted means of formally measuring a system and ensuring that some level of test quality exists?
Formally? No, this is basically impossible by Rice's Theorem. There is not even a guarantee that if you have 100% test coverage, the program is good (the tests could be flawed).
This is just a natural limitation of turing completeness. You can't decide these properties while also having full computational power. In order to decide such things, you need a less powerful mode of computation (something not turing complete) that can be analyzed more thoroughly and with more guarantees.
That makes sense, thank you. Yes, it's specifically "test quality" I'm looking to measure, as 100% coverage is effectively meaningless if the tests are poor.
Yea I'm afraid the only real way to "measure" that is to read through the tests and the code and make a good ol human value judgement on the state of the code and tests. But it won't give you a number.
There are tools to detail the code coverage if your tests. I've worked with Istanbul in the past, and it's helped to point out parts of the code that could use more attention
I use coverage tools like nyc/c8, but I can easily get 100% coverage on buggy, exploitable, and unstable code. You can have two projects, both with 100% coverage, and one be a shit show and the other be rock solid - so I was wondering if there's a way to measure quality of tests, or to identify code that really needs extra attention (despite being 100%). Mutation testing has been suggested and that's really interesting, I'm going to give it a go tomorrow and see what it throws up!
So true lol. Mgmt just announced a directive at my work last week that code must have 95-100% coverage.
Meanwhile they hire contractors from india that write the dumbest, most useless tests possible. I’ve worked with many great Indian devs but the contractors we use today all seem like a step down in quality. More work for me I guess
Mutation testing. Someone else mentioned it as PIT testing, but its actual name is mutation testing. It accomplished exactly what you’re looking for here.
I.prefer to count and report total tests run as part of each build. We get impressive large numbers, but there is no way to put any specific goal on the exact number, we can always go higher.
Maybe fraction between money spent on writing code versus money spent on testing code?
I'd like to see state space coverage instead of line coverage. That, at least, catches silly "100%" cases.
I don't know of a tool that provides this metric. I don't even think such a thing could be made for most languages. still, useful to think about when reviewing code.
Different applications require different tests, so no measure is going to please everyone. If you're making embedded devices for an airplane, the buyer might ask you to provide a formal proof that the program works. In contrast, web apps tend to simply use end users as testers, since it's cheaper.
This might not be exactly what you're looking for, but there is verifiably correct software. You can use proof assistants or work in limited computational models (i.e. always-terminating, non-Turing-complete).
One example: https://statebox.org/what-is/