211: Stamp out test dependencies with pytest plugins
This episode is about making tests reproducible in isolation. What does that mean? Well, let's say you've got a test suite that has a test failure in it, and you try to debug that test. And in isolation, you try to run it by itself, and it does not fail. It passes.
Brian:Is that a good thing? Well, of course, it's a good thing. Your test passed, but then you throw it back in the suite and run the whole thing and that same test fails. What's going on is you've got a dependency between your between different tests and something that's hap something is happening in the suite, maybe in a previous test or maybe in a fixture that is not the same during the suite as it is during the test and isolation. There's a few plugins that can help you get things your house in order so that that doesn't happen, and we're gonna talk about that in this episode.
Brian:A test suite is a bunch of tests that get run together. And we'd like to be able to run our test suite like in CI or through talks or something and have that be the same as running each individual test by itself. However, there are things that get in the way to make that not quite work. We start out by writing a single test and debugging it and getting it ready and then committing it, adding it to the suite, and then we just add more tests. We keep doing that, developing tests usually 1 by 1 and then adding them into the suite.
Brian:Sometimes, they're a bunch at a time like with parameterized tests. However, usually that they work before you commit them. What happens when there are also test fixtures? Set up and tear down code that happens before and after tests or before and after a group of tests, like around a module of tests or maybe out of a a package, entire directory, or maybe the entire session has set up and turned around it. That is one of the things that makes things run a little different when you're running a test in isolation than when you're running the entire suite.
Brian:So how do we get around that? Well, you can one of the ways to get around it is to just don't do any of those large level fixtures. Do, completely reset the system, start from scratch for every single test, but there that might not be feasible, and it might take forever to run your test suite because we do the grouping because it saves time. So first, let's, talk about what causes some of the dependence between tests, and then, we'll talk about ways to, ways to to check to make sure your tests are independent before we get them into the suite. We're gonna use a few plugins, and then we're gonna talk about how to debug a test dependency failure.
Brian:We get test dependence. We have a dependence between test runs, between 2 different tests or a set of tests usually because of a few things. There normally, I'm looking at system state. So the system under test, the code under test has some state involved with it, and a previous test left to that system in a state that's different than this the initial setup and that will that affects a following test and causes a failure, or the the test that is failing is assuming a state that is more specific than it should. It should possibly assume a broader set of circumstances.
Brian:Maybe there's not enough test cases checked in the default case. So that's, the possibly the system state that either a previous test is leaving. Well, a previous test is leaving a the system in a state that is different than what the test expects. Another thing could be actually data within the test suite itself. There are ways to they're not necessarily good things, but there are ways to pass pass state between tests, and perhaps something isn't getting set up correctly if you run it in isolation or it gets mucked up if you run some other tests beforehand.
Brian:I normally don't have a lot of tests that have data that gets passed between tests except for things like what temporary directory am I in or what my connections are. Those are all handled through fixtures though, but that might be it. So the other thing is fixture setup and tear down. Oftentimes, let's take there there could be anything like hardware that you're testing or anything, but let's take the example of I'm I've got a a temporary database that I'm setting up at the the beginning of my test session and, maybe between a like before and after class of tests or a module or at the very least between tests, I am, cleaning up, any anything that a previous test is set up. So let's say I just assume that the the database is set up, the columns are set up, but that there's no data in it.
Brian:It's empty, and a, in particular, test can, add data to the database and then run tests on it, but then when the next test comes around, it'll actually clean out that data before it runs the test, clean out any previous data, or a tear down from the previous test will, clean that out. So one of the test failures can be that your cleanup isn't quite right. There's something that you're not cleaning up between tests that you need to be. Now you probably have caught that already. However, it might it might not just be a database thing.
Brian:It might be maybe a a configuration file that one test is adding that or a configuration state of the database that doesn't get cleaned up appropriately. So something isn't getting cleaned up between the tests. The other thing when I'm I'm I often am testing hardware devices, so, one of the things that can happen is it isn't that the state of the system is not in the right state when a test runs or it that is the case, but it's it's that it's not there for very long. Like, there might be a settling time or hysteresis or something going on, and one test is changing the system and then stopping that test, before the system can settle into a default state that the following tests expect. And this can be dealt with when the systems I'm working with.
Brian:There's, operation complete commands that you can call, but there's other ways to verify that usually, there's ways to verify that the hardware is done doing what it's doing. To I don't wanna put sleeps in the code. I have before. I have put there's been times where there's nothing there's nothing left to do but throw a a couple milliseconds sleep in my test code. I don't like any sleeps in my test code, but sometimes you kinda have to put those in there, but try not to.
Brian:Okay. So those are causes of dependence between tests, but how do we see if those if there are anything in there? Is is any dependencies between our tests? We can easily check this by, we I mean, we could brute force go through and run every test individually and in the suite and make sure that both those work, but it won't catch like timing errors and things like that. So the other thing to to do is to change the order of the testing.
Brian:And one of the the plugins I'm gonna use a few plugins. One of the plugins I love is called pytestrandomly. So that is a, a plugin maintained by, Adam, and it goes by Adam Change on GitHub, but it's, it will randomize your tests and shuffle them up. So all you have to do is pip install pytestdashrandomly. And when it's installed by default, it randomizes your tests.
Brian:It doesn't completely randomize them. It should run about the same speed as your test it ran before, and it gets it does this by randomizing, randomizes the modules first and then, the the functions within those modules, and that hopefully will be such that your, your fixtures still run the same same way. We'll talk about the problem with complete randomization a little later with a different plugin. So that's, Pytest Randomly. You install it and it just randomizes everything.
Brian:The, by default, you can turn it off. There's flags to turn it off. One of the other cool things about this though is it prints out the, the seed. So it it comes up with a random seed and, and that that seed is used for randomizing the tests, and it prints that out so that if you if you wanna run the same session again like, let's say you're, you have a test failure in CI and you get back and you're debugging it in isolation and you can't get it to isolate, so you say, I'm just gonna run the whole suite locally, but it's gonna have a different randomized list. But you can pass in the seed from the CI because it's gonna get printed out in the in the log and you can use that seed to reproduce the exact same order, which is so cool.
Brian:The other cool thing about that is that seed is not only used for the ordering, but it that seed is also passed into, plugins other plugins that use random features like factory boy and faker and model bakery and even NumPy. So if you're using those, random functions from those for different tests possibly or your own code, those random seeds will get set by, Pytest randomly. Very cool. Another related plugin like that is maybe you don't really wanna just shuffle everything, but a simple way to do this is another plugin, by the same maintainer called pytestreverse, and that just like just takes your list of tests that are going to get run and normally it runs, your directories by alphabetical order and then within each directory, the, the func the test files in alphabetical order, and then within each file, just from top to bottom. That's what pytestdefault is, And the pytest reverse just reverses that.
Brian:Take the whole list and then just runs it in reverse order. The cool thing about that is you're gonna have the same tests grouped together. The the different tests are gonna be this they're gonna be together, but they're just gonna be flipped in order. That's pretty cool. That might be sufficient to test your independence, and it might be a good first start, but check out randomly also.
Brian:Another related plug in, which I also really like, is called pytest dash random dash order. So pytest random order. It's a different plug in than pytest randomly. One of the things I it does not pass in the seeds to, factory boy and NumPy and things like that, but it does it just randomizes the tests. And one of the things I love about that is the default, again, is is it makes sense, but the there's a couple things that you can do with it.
Brian:First off, it does not randomize everything by default. You have to pass in a, dash dash random. I guess it's random dash order, dash dash random dash order. That's what you pass in to turn it on, but you can see that in the in the description in the documentation. But, so by default, the test runs it in normal order or you can pass in a random order.
Brian:You can also pass in a seed. I don't think you can detect the old seed, and I'm not sure whether I can't remember whether or not it prints out the seed to rerun it. But, but if you if you want to reproduce it, you can, like, in CI, I would I would recommend throwing in a, a random seed and listing it just as your as part of your CI, but then changing it every once in a while maybe, but or just do randomly. Anyway, but pytestrandom order is pretty cool. The other cool thing about it is it has a bucket type.
Brian:So, the buckets would be like, like shuffling all the module the functions within a module and then shuffling modules and everything, but you can't just completely randomize everything and just say, I want the bucket to be global, and what that does is it really does randomize everything. This might The warning though here is that might make things run slower, because the fixtures pytest will detect that fixtures between, your your fixtures might run more times. So, like, let's say you've got a function scope or a, let's say, a module scope fixture. The setup is run at the beginning of the module, and then it runs a function. And then since it's gonna hop to another, maybe another module before it finishes that module, it'll run the teardown, go to the other module, and then when it comes back to do another test function within that file, it's gotta do the setup again.
Brian:Anyway, so if you've got a lot of work within yours within your setup and teardown, you might wanna not want to do big buckets, but it might be fun to just try to see if you can, that will be completely isolating things pretty good. So what does what does this do? Hopefully, randomizing things, things should still work in any order, and they should work mostly the same. This will help. It doesn't completely eliminate the possibility that you might have, dependence between tests.
Brian:I don't think. I mean, I I don't think I can mathematically prove that, but if you shuffle all your tests and it runs just fine also, I think you have a fairly high confidence that things are things are gonna be pretty good. So okay. So we've just used, plugins to shuffle our tests essentially. So that's that's you wanna do that on a regular basis and probably maybe in CI.
Brian:So maybe you're testing without shuffling it locally, but within talks or within your CI, you add a shuffling feature to just break things up a bit and see if you can, reorder them, make sure that they're independent. That would be great. But what happens let's say you're not there yet. You have an existing suite, and there is a dependence. You've got a failure.
Brian:How do you get that? Well, we're used to doing binary searches within data structures. I just wanna, make sure that people remember that binary search binary searches work great in life too, and they also work great in finding test failures and dependence between tests. So let's say I've got 10 tests and the 9th test fails in the suite, but it doesn't test when I run it by itself. What do I do?
Brian:Well, I just I can start by just, like, running the 8th test first, the 8th to 9th. I probably would do that, actually. Just just the one before, see if that's enough to check to that's the one that's leaving the system in a bad state, possibly. I mean, it's right before it, so decent chances. Right?
Brian:But it might might not be that. So we wanna eliminate the set of tests that are affecting the 9th test, and we can do that by, like, narrowing the scope of tests that get run before it. And and so we'll want that seed, that random seed, if it's running in random order, but if it's not, we don't have if you haven't randomized yet, we don't need to worry about that, but we can just use maybe dash k or do the test list and list tests individually. In some way, run a subset of tests and reduce that subset until we just have the minimal set that reproduces the problem between those two tests. Hopefully, there's 2 tests that run-in one of them is causing the problem, and, the other one isn't.
Brian:And, hopefully, we by examination, you can look at what the tests are doing and figure out possibly what state is getting left around that is mucking up the second test. I know that's pretty vague and not very detailed instructions on how to debug that, but, all systems are different, really. So what are the key takeaways that I want to you to take away from this episode? I'd like you to remember that, when you're debugging a test, it's really awesome If a test failure in the suite is reproducible just by running the test by itself, it'll make your life better for the rest of the duration of re working on this project. It's so much better that it's worth it to add randomization to your CI system, and possibly locally to shuffle your tests so that you make sure that your tests are independent and pass like that before going on.
Brian:And if you start this early in the project, it'll help you. And if you already have an existing project that does not work well like that, try the reverse. Reverse is a is a minor change, and, hopefully, that's enough to, to to keep things working. However, these are plugins. You don't have to do much except for install them.
Brian:Remember that pytestreverse and pytestrandomorder are plugins that you actually have to add a new flag to randomize things, but pytestrandomly will by default randomize things, and you have to add a flag to turn that off. So, 3 great plugins to use to test to make sure your tests in your test suite are independent. Thanks.