The role of AI in software testing - Anthony Shaw

Anthony:
Got a plan. Let's do this. I am a really good tester and I want to create amazing tests. What should I do?

Anthony:
Hello, Anthony

Anthony:
Hey Brian

Brian:
Welcome to testing code again I was just looking it up and it's been a long time unless I got this wrong I think the last episode was episode 101 february 19 2020 awesome to have you back there's been a lot of changes in the last five years AI is one of them it's one of the things we were going to talk about today right.

Anthony:
Yeah it is do you want to explain how this ended up happening uh.

Brian:
No you explain.

Anthony:
Yeah so I there we well you made a comment on one of the podcasts that I'm overly optimistic about AI let's put it politely and I said I'd be happy to talk to you about it and actually kind of show like what role AI has in testing and kind of what mistakes people are making with it where people are overselling it and actually just explore a bit more so I thought it'd be a good discussion yeah.

Brian:
And I actually have seen a lot of that sort of stuff promoted but I haven't played with any of it yet so that sounds great so how do you want to start.

Anthony:
Yeah so I think maybe we can start with talking about the, the number one testing framework if you remember the python survey what's the number one answer to the python survey for which testing framework do you use I'm.

Brian:
Not sure was it I don't.

Anthony:
Use yes nothing yeah no the number one testing framework for python is nothing or I'll get around to that at some point, but end users. Yeah. I think it's important to understand why that's the answer. And I always felt like with Python, cause let's do a lot of open source work and read a lot of open source projects and stuff like that. I feel like if you compare Python to JavaScript. Then I feel like 70% of the JavaScript packages I read don't have tests. And 70% is being generous. Whereas more Python projects I look at have some kind of testing. And why do you think that is? Why do you think Python has more? It's still not enough.

Brian:
Well, I was thinking of JavaScript stuff. I'm guessing that people try it out. They just try it in their web browser. And or maybe the project that's using the javascript package has tests I don't know testing is hard and people don't learn how to do it.

Anthony:
Yeah I think it's the last one I think it's the latter I think testing is testing is easy when you know how to do it and it's something you don't really think about but the first I always feel like the hardest bit is writing the first test, like the scaffolding of the test and only once you know how to write tests properly do you then consider when you're writing the code how am I going to test this because when I look at projects that don't have tests and I ask the developer like okay can you create some tests or why haven't you created any tests they're like oh but how do I start or like how do I even test this and then you look at the code and it's written in such a way that it would actually be quite cumbersome to set up a basic unit test like a let's just test a single happy path we give us some inputs and it gives us an expected output like how do we just do that as the first test is quite difficult because people didn't ever consider having to write tests in the first place and they just kind of wrote the code however was easy for them yeah.

Brian:
But but like they must know I mean a lot of people have some idea of that it's working so like.

Anthony:
When I talk.

Brian:
To people about that I just say well how do you know it's working now um put that in a test but but I I'm usually thinking about, api level tests or system level tests but uh and there's a lot of bad information about test writing too I think.

Anthony:
That.

Brian:
This isn't the topic of this show but the thing I really can't stand is people that say, okay, we're going to start testing. It's going to take you twice as long to write your code now, but there are benefits. I'd like to hit those people. It shouldn't be taking you twice as long. It should be taking you about as long as it takes you now. You'll just debug less.

Anthony:
So

Brian:
Pet peeve of mine of mostly the unit test test pyramid people but anyway.

Anthony:
Yeah I'm not sure about yeah the pyramid thing is interesting but I don't really follow it I think it depends always on the project and it depends on like when you're writing the software I feel like you you kind of have an idea which parts of the project need particular attention like which bits are flaky like if I write I don't know 20 000 lines of code not every line of code is equal right I wouldn't say okay my goal is to get 90 coverage although that is sometimes useful but I wouldn't say my goal is to get coverage of every single line equally because some of it is not stuff that is potentially going to be brittle anyway and then there'll also be parts of the code like there'll be a function in there where it's a little bit complicated or it's it's calling things in a certain way or it's got lots of expectations or whatever and then I know that I should be focusing more testing time on that so like how does that handle different edge cases and stuff like that.

Brian:
The other thing is a lot of teaching like we teach people to build cool things by plugging a bunch of stuff together and we teach testing by how to test a function but a ton of code isn't functions it's It's hooking things together. And I also think that the education system, I didn't get taught a lot of testing in college, and I still don't think people get taught that much. And I think it's not because we shouldn't. I think it's because I don't think the instructors know how to. Testing is a thing that we think we need or we know we need, but we don't have enough tests. Everybody figures they probably ought to have more tests.

Anthony:
And.

Brian:
So I think that's possibly why some of these companies are thinking AI to the rescue we can have AI write the test and then we don't have to maybe I don't know.

Anthony:
Yeah I was working on some requirements and writing like a long list of a lot like a long spec and giving it to an llm and saying can you create this for me create this code for me, and then at the same time I wrote a test suite that basically would test whatever code the AI created and then test all the things that I put in the spec so it was like it was a really simple like data reflection class and I said that you know there should be a name and an address, like a unique id and that field should not be changeable the address should always have two lines and the name should always you should never be empty it should always be a string, and it would create all of that code for me so I'd give it one big list of instructions it would create the code for me and then I'd handwritten the test for it at the end just as a way of comparing like these different AI models yeah and what was interesting, like all of them got above one I think one of them kind of got a bit confused with one of the requirements they all created slightly different but they all kind of spat out the answer.

Anthony:
But really what it was producing was like a very like entry level like my first engineering job, like level programming it wasn't like a complicated thing that I'd asked it to do and I saw a video I have to share a link with you because it just made me laugh it was, there are people now who instead of even writing the spec for the AI they asked the AI to write the spec for them and somebody was doing a demo where they said oh I wanted to create angry birds but the AI works better when you give it a detailed list of requirements for what exactly you need and I can't be bothered to type that in so it said to the AI can you write can you describe the game angry birds in terms of a software spec and it did it for him and then he copied and pasted that back into the AI and said can you create this project for me and it went and built a mobile app called angry birds which I'm sure breaks some sort of copyright.

Anthony:
And the brilliance of the video is that if you just paused it before the demo you'd be like this is amazing this is going to take over the world this technology like as engineers like we're really in trouble but it's just so funny because you only actually watch the demo is like the worst thing I've ever seen like it pops up and the title like it had a menu and stuff and a game and then you go into the you click on a level and like the little pigs I don't know if you've ever played angry birds but you kind of like, you have a catapult and you catapult birds at the pigs but like and they blow up.

Brian:
And stuff like that yeah.

Anthony:
Yeah but the pigs like just kind of like fell out of the sky dead already the bird the bird's beak was on its chin um the catapult like he tried to sling the catapult but the like the arms of the catapult the physics were just completely wrong so it just didn't make any sense and then he eventually managed to kind of fling the bird and it just crashed and then there was the end of the video it's like okay but.

Brian:
It's that far maybe it's easier to fix it from there I don't know.

Anthony:
Yeah it is and it's like does that help you I don't know like that's scaffolding because if you ask me how would you make a game for a phone like I wouldn't have any idea how to start right um whereas at least that's given me something to go on like it's you know it had a menu and you could press it and it launched on his phone I don't even know how you would do that, but it's interesting because it like gets you started and a lot of the times when you're working with a technology that you're not like super familiar with or you don't use every day, the hard bit is just getting the first prototype running and then once you've got something running you can kind of iterate on it.

Brian:
That that actually is the part that actually scares me a little bit because the thing I feel like I'm using AI tools for is the stuff I don't know about. I had it right. I needed a PowerShell script, and I'm not a PowerShell person. So I had described what I needed, and I got a script out that worked. So I was happy. However, the stuff that I know about, like pytest stuff, when I ask AI to do stuff for me, it's totally wrong. Yeah. So my worry is the stuff that I'm happy with. I just don't know enough about it to know that it's totally wrong.

Anthony:
Yeah.

Brian:
Or that it's inefficient or ugly or bad code or whatever.

Anthony:
Yeah and.

Brian:
So I think we're going to have a lot of code that's works sort of but is bad.

Anthony:
Yep I generally agree with you and I think if you don't know how to use the tools effectively or if you just blindly accept what they're suggesting then it's not good quality and similarly like stack overflow I think is is a bit better because at least like if you ask the question stack overflow and there's an accepted answer because that's kind of what I compared like the chat iis to mostly is stack overflow because for developers like if you don't know how to do the old way and still is the current way it's like you just ask the internet and it generally tells you to go to stack overflow and then you go there someone else has asked the same question and there is an answer and you copy and paste the answer into your solution and you see if it works and then if it does ship it like I feel like that's what most developers have done for a very long time and only when you get really experienced do you read the answers and go I wouldn't have done it like that and but you're not even asking the question because you already know how to do it.

Brian:
Well so one of the tricks with stack overflow of course is to not pick the accepted answer because the person that asks the question is the one that accepts it it's often the answer with the most upvotes so other people went yeah that seems hinky and they've upvoted some other answer and that's usually a better one.

Anthony:
Well the mistake I've made before is copying the code from the question because I was really tired yeah.

Brian:
Because sometimes they're really complete questions with code examples and then right at the bottom it says but this doesn't work and yeah that's funny so, One of the tips that I got from somebody recently, or at least I heard, maybe I heard it on a podcast, was to ask a question into the AI. And whatever answer it's giving you, it's going to give you a lot of terminology that's enough for you to be able to understand the documentation then. And they used actually pytest as an example, because pytest heavily uses fixtures, but if you don't know what a fixture is, that word doesn't mean anything to you. So you wouldn't even know to look that up. So, yeah, anyway.

Anthony:
Yeah, so I thought what we could do is pick one of your projects, and we've got one which is called Cards. Didn't you do an episode series on this years ago?

Brian:
Probably. and then also it's it's heavily embedded in the the current pytest book.

Anthony:
Oh yeah yeah I've got both so I thought we could do is pick this project and I've got it on the screen but I'll be descriptive for the podcast and I'm gonna get the AI to write some tests for us because I think that's kind of like we'll talk about this but what role does the AI have in testing, because I feel like a year ago when this technology was definitely a lot less mature and more rudimentary the one of the main lines I would hear and I'm or I myself as well said this is like the AI is good at writing tests.

Anthony:
And I've been using this like every day for the last, I don't know what it's been now, like a year or two. It feels like a long time, but it probably wasn't that long. Whenever it came out anyway, whenever the early versions came out, I've been using it like every day and I've slowly been going back on some of those statements and saying, well, it kind of depends. And like, yes, you can get it to do that, but you kind of need to know what you're doing. And it goes back to your stack overflow point like or just generally with with technology if you're asking for help from something whether that's the internet or the AI or the person sitting next to you you kind of need a level of knowledge to know whether the answer they're giving you is good enough this is more philosophical than technical but like do you just blindly accept what they're suggesting and the danger with the AI is that if you don't have to ask the question.

Anthony:
Correctly then the answer it can give you could be poor or wrong or buggy or all of the above and yeah that's one of the big big challenges or just old yeah yeah oh yeah that's that's another common issue um so.

Brian:
How are you using AI usually are you just punching in questions into an interactive thing or using copilot or.

Anthony:
Yeah so I rarely use the like the chat uis um so there's a few of those there's like chat gpt is the is the like most popular one by really long way and I've seen I've watched developers who've got a browser open on one screen with chat gpt and their editor open in the other one on the other screen and they're basically just constantly going backwards and forwards between ChatGPT and the browser and just copying stuff backwards and forwards. Copilot, like GitHub Copilot, can... Built into vs code and it's built into visual studio and pycharm and now the java one that everyone uses the eclipse as well I feel like pretty much every like ide has one now like it has a chat box somewhere and then like cursor.

Brian:
It's just sort of built into cursor right.

Anthony:
Yeah Yeah, Cursor is like a whole different thing. It's like every button in Cursor has got some kind of AI in it. It's like short of just moving the mouse for you, it pretty much does everything. So when you run tests in Cursor, if one of them fails, it will suggest that it fixes it for you, which again goes back to if you don't understand this technology enough, dended that can be quite dangerous like if it fixes it for you it could just be like removing the assertion from the test I was just gonna.

Brian:
Say hey it's this this nasty assert right here that's.

Anthony:
What's causing the problem I made it pass um or I've seen some like pi test extensions where like they're joke ones but like they just they're like pi test yolo or something it just like lets all the tests pass yeah just it generally doesn't do that like it looks at the error messages and then like tries to guess based on experience like what's what's happening so yeah I feel like we so I mostly use it for code completion which is so here I've got I'm in vs code I've got code completions enabled which means as I type it will try and it will basically like finish my sentences for me so I don't know like it's I just type def which could be anything and it's assumed I'm trying to write a function called get card state we're looking at your cards project there's an api in here I don't know why it's come to that conclusion yeah but it's.

Brian:
Oddly not that annoying when you get used to it you just sort of ignore the gray if you unless you're curious.

Anthony:
Yeah, and this is also, and now I want to turn on piloting rules. So that's kind of like code completion that we've had forever.

Anthony:
Like, you know, when you call a function, the fact that the editor tells you what parameters it has and suggests them for you, like that's a really helpful feature. And when we originally had that feature, there were a lot of people actually complained about it, like this is cheating um it's not as helpful like I don't want to have to go and find the source code for the thing that I'm calling and look at the parameters um and if you put a doc string in a function and you describe the parameters and what they are for example or you put type annotations on them like the whole point well the vast majority of times we.

Anthony:
Use type annotations in python now is this documentation so we've kind of had the ability for when you start calling a it's like if I've got okay so in this one I've we've got data class called card and if I just made a card and I call it like demo card equals it's probably going to predict I'm going to create a card so like it's the reason it's doing that is because everything before the cursor is context for what it thinks I'm trying to do so if I then say is card it should then try and guess what parameters are based on what comes before the cursor so like you've got a data class here called card it's got a summary owner state id it's looked at those and then kind of predicted that because I've called it demo card then I just want like demo inputs, so like that's it's.

Brian:
Actually not too bad it picked demo card demo for summary demo user for owner.

Anthony:
Yeah so like it's it's got a number and then if I told it what I wanted to do like, let's say from diction like also create demo card from dict like if I say that that's what I'm going to do that it will create a dictionary that's basically the same parameters that I just had before so again it all becomes context so you're kind of stacking it one on top of the other and then the next line will be okay and then call the from dict function so the method sorry so like if I was writing a if I'd written some code and I wanted to start using it to validate it normally the first thing you do is you kind of like arrange your test data you act on the the code and then you assess what your output should be it's like the aaa pattern in testing is like.

Anthony:
Pretty much every test looks like if you boil it down to three components and so the AI is kind of able to look at the code the context before what I was writing and say okay you've got a type called card it's got these fields it's got these methods and therefore if I the instructions like what you would normally type into chat gpt you can just write that as a comment so like the way you steer these AIs to do what you want is you just overly verbose with comments and then if you do that it will just populate that for you like I find this helpful because like it's just cumbersome to write this stuff out this isn't like difficult code it's just called it's like constructed an instance of the class with some demo data I could do that an entry-level engineer could do that it's pretty obvious. There's nothing clever about this. It's just convenient.

Brian:
Right. And I actually found the comment thing on accident. And that's one of the cool things is you can just accidentally discover these things. I wrote a comment for, I wrote the start of a function, wrote a comment for what I wanted it to do, and then the suggested code was pretty close. I usually don't have the code be exactly correct, but occasionally that happens but usually or often it's close enough.

Anthony:
Yeah yeah so I think the way these tools are good is where the everything that comes before the cursor is correct and relevant and if that is the case then what it's going to predict is usually pretty good if you on the other hand start with a blank file and let's make it python and I make a comment that says like download the internet. Like, So what it produces is probably just going to be nonsense. Let's see what it comes up with. Come on. Come on. Def. Download. Okay. I don't even know. Is this going to keep going? Like, if I just keep pressing tab, like, it's writing a program. I don't know where it's going. So.

Brian:
Wow.

Anthony:
Okay. So if we could then go back and go, okay, what have you actually done here? It's downloading some files I don't know what it's doing it's just.

Brian:
Looks like it made a downloaded files directory it's looking through what it's joining the path huh.

Anthony:
This code may or may not work I don't know but.

Brian:
Is it yeah I don't think this is going to do anything useful.

Anthony:
But yeah but if you asked it to make a program that downloads the internet like, I don't know where how I would use context but the important thing is here that like what there was nothing before the cursor so it's not I think people misunderstand that like it's somehow like reading and understanding your whole code base so it's like most of the time it's not doing that unless you tell it to it's just looking at what is on the screen and what are the often like what are the tabs you've got open in the editor so like if you do split screen between two classes like it will kind of look at both of them um so.

Brian:
If I have more tabs open and we'll do better.

Anthony:
Yeah generally oh cool also I didn't.

Brian:
Know about the cursor part so if even if I want to function in the like the top part of the file it might be a better function if I write it at the bottom of the file first and then move it.

Anthony:
Yes annoyingly um interesting.

Brian:
These are great tips so far.

Anthony:
Yeah Yeah. So, and there's other things like it will do, like if you, if you make a mistake, Then it will kind of propagate that mistake. So if I go like with open, I was demoing this the other day. If I just do with open, it assumes I want to write to a file called foo.txt and it will write that for me. If I add an extra parameter on here that's wrong, like bananas equals two, then the next time it will just propagate that error. So what I was saying about the cursor, I've made another function. I made one called foo which calls open in a with context manager with an extra parameter called bananas which is wrong like there was no such parameter and then when I do another function called bar it's like hey let's do the same thing with bar with two bananas, so I'd argue.

Brian:
That bananas should be an argument to open.

Anthony:
Oh and now it wants baz as well and it's going to do the same thing right so this is more like If you ever use Excel and you select two rows in Excel and then you drag them down, it will just look at the pattern and basically copy and paste the pattern. So, if I just wrote a line that said...

Brian:
Well, copying mistakes is bad, but the whole pattern thing is kind of nice. If I do have to do repetitive stuff, having it do mostly the same thing again is kind of cool.

Anthony:
Yeah, so if I just wanted to write the word banana again and again, it would assume that that's what I'm trying to do, and it'd just keep doing it. So, that's kind of where it gets to. sometimes with these it will get stuck in a loop of they've improved I think over the last six months but sometimes we'll get kind of like the things that come before the cursor are actually a repetitive pattern and it will think that you just want that pattern again and again again, one of the common ones was import statements like it was like oh you've imported five things from that module therefore you just want to continue importing things from that module forever.

Anthony:
Just like no that's not what I want to do if we took our example with a card here like okay so we've got a demo card which calls the constructor on the data class you've given it our parameters and it wrote that it wrote that code for us and then I said in a comment so I told it what I wanted I said I also want to create a demo from dict because your class I saw you've got a from dict class method and then the other thing I want to do is I want to check that the to dict the output of that matches, the dictionary that I gave it when I called it with from dict, yeah that would be a good test yeah I reckon it's going to predict that I'm going to do that next oh close okay it's predicted that I want to assert I'm checking some of the statements if I tell it, check that to dict and from dict are working oh there we go it got it in the end I just gave it a comment and it knew what I wanted to do So now it's got an assert saying, assert that when I call 2dict on the card that it created, that is the same as that one. However, and this is where we can talk about some of the nuances here. Can you spot the mistake, Brian?

Brian:
Well, I mean, other than... Well, it's, I don't know.

Anthony:
I mean, I think that test would pass.

Brian:
Well, it's not a test, for one. It's just that.

Anthony:
Yeah, yeah, yeah. It's just like, but it's checking that one, not the one I made from the dictionary.

Brian:
Oh, right.

Anthony:
So like, instead of doing.

Brian:
But that's hard to spot because it kind of looks okay.

Anthony:
It should be doing that instead. So it should be testing the one I made the second time around. So this is like the other trick is that if you, if when you're writing tests if you write one that does tries and it does too many things then the code can get confused so like depends on the model but this is I think I've got this configured on like the most advanced model we have at the moment but it uses context for everything before the curse and therefore if what is before the cursor in the encapsulation so I'm not writing this inner function which again is not realistic because you would write this in some sort of function normally but if I did that and I described things better it should give me better answers but there are definitely times where it can suggest things which are wrong but are really hard to spot and that's what's tricky and.

Brian:
Actually that test would have passed because the data that it gave to both.

Anthony:
The scarves is the same yeah can.

Brian:
We get to like the part of.

Anthony:
Testing what.

Brian:
Stuff is what are people saying that it can do for you for testing.

Anthony:
Yeah so the thing that definitely I've seen this demo a hundred times and I'm going to pick on your code I'm not going to pick the dates class because that's too easy there's another one here you've got a class called cards db that's got a initialized method it's got some methods on it and I'm gonna ask it, to generate a test so yeah so there's a couple of ways you can do that you can kind of do it in line so you could be like can you make tests for this or you can say in the chat and they do have a thing called like four slash tests now I think or like setup tests and it like creates them for I've never really used this, tests for the cards db class, it will look to see what testing frameworks that you've got configured, I'm going to tell it, and it asks you where the tests are, this is a bit more advanced, this one, this is like, this feature basically actually looks to see what testing framework you've got, instead of just generating tests in any random test framework. And it will put something together. I don't know what it's doing. Oh, okay. It's installing stuff. I think it's kind of configuring the tests now. It's like configuring test suite or something. And it should, when it's finished, spit out a file somewhere.

Brian:
Curious how well it'll do.

Anthony:
Yeah. I don't know what it's doing. Sometimes you have to ask it twice. Is that my children it's like, they start doing it and then they get distracted and then you're like did you do the thing I just asked you to do and then it's like oh yeah sorry I forgot about that so if I just write it in words god.

Brian:
Dad you already asked me.

Anthony:
Yeah I'm doing it I'm just replying to my friends so if I say create tests for the cards db class it's saying Thank you. What is it telling me to do.

Brian:
As a plan.

Anthony:
It's got a plan I've got a plan let's do this whoa okay there you go it's thinking about it this is his plan it is I think it's finished, the model's stream output so like like they kind of write as they go so you have to kind of, wait for it to finish its answer okay it has created a test set of tests for your class and we can we can talk about this it suggests putting it in the source folder but it doesn't really matter so I can just click on this button and it will just like paste that in I don't even have to copy and paste it does it for me on tender or tender hooks I.

Brian:
Never knew which one that was.

Anthony:
It's not the one you I always get it wrong as well and it has something weird like old British meaning or something, it's like oh yeah the hooks I used to hang your hat on okay like so let's look at this, what has it done and this is where we can nitpick so, let's nitpick this, so it has created some text now something you always want to check is Did it cheat?

Brian:
Look at the existing desktop.

Anthony:
Exactly. So did it just go and look at the existing desktop and just copy and paste it and go, today, look what I made. I don't think so. Like, yours would have been in, where would it be? See, yours is a very, no, it doesn't look like your code. Yours is a very well-structured.

Brian:
It was a couple years ago when I wrote it. It probably isn't the best. We'll see.

Anthony:
Okay. So it is put together. Let's work the actual class on the right-hand side and the tests on the left. So we don't need that comment because it's self-explanatory.

Brian:
It created a fixture. That's cool.

Anthony:
Done it it's created a fixture so I like that often it will use the information that it can gather about the project that you're in so it's like you're in python when I said like create tests it didn't spit out javascript tests so I'm like well done AI you got the right programming language but we'll also like look at to see like what other things you're trying to do and like oh it's like it's a pytest project or there's py pytest there somewhere therefore let's make a pytest test module it's also looked at like the implementation and it tends to do them in order so it will look and I was specific as well like I'd create tests for this class I didn't just say create some tests because with AI like the more specific you are with the instructions to better the output. And so if I said, create tests using pytest and use fixtures where appropriate and create multiple test methods for each test.

Anthony:
Thing you want to verify and also create positive and negative tests verify that it handles exceptions correctly verify that it handles different types other than the ones which annotated like you could give it a long list of things that you want but I didn't do that I just said make some tests so like before you nitpick the output it's important to remember that like if I told an engineer make some tests you've got two or two o'clock like what would they come up with and like it depends on the person and what time it is if it's 145 and I give them 15 minutes, they're probably going to make something pretty basic if it's nine in the morning and they've got like five hours to do it then they should make a pretty decent like test suite so yeah.

Brian:
So like I'm taking a look at the first like the first test was add a card and it looks like it create created a card added it to the called ad card and then got an id back and made sure that the id is not none, I would say it hasn't actually checked to see that the card was actually added but just that the id was returned but.

Anthony:
So

Brian:
Not the best test but not awful I guess you're going to exercise the add card function at least so.

Anthony:
Yeah and it's I guess it's got two tests for test add card so, and again with the input another trick with AI is to give it a number give it a goal like if same with the make me test by two o'clock like if I in this one if I actually said this is a bit with, but if he said... I'll tell her which file I'm looking at. Make, write six tests. This is a really weird trick with the AIs. If you give it a number, it gives you better outputs. For the, like, add card, on the card, StB class. If I do that it will should have a think about what it's done and it's going to suggest more tests and they will test more scenarios and so if it compares now like oh this is what I did last time and this time I'm going to do this okay so it.

Brian:
Doesn't create a full new test file it's going to patch it.

Anthony:
Yeah this will this is like an edit feature this is this is the kind of the difference between copying and pasting stuff out of chat gpt and just having it in the editor is it will actually kind of like in line do it so you don't have to work out the differences so here we've got like add card and it's kind of come up with some different scenarios that it wants to test but if only because I told it to like if I say make a test it will just make one test and it will be like the simplest vanilla test you could think of if I say make six it will get a, Hopefully six covers everything. Yeah, I don't know what the magic number is. So I guess this feature that I'm showing you is it making the test for you. I genuinely don't think this is a good idea. Okay. And I reckon you're going to agree with me. So why would you think this is not a good idea?

Brian:
Well, partly the first test I looked at It sort of tested something, but it doesn't really test all of the conditions of the correct answer.

Anthony:
Yeah. So if you looked at the function, the method, it's got like, if no summary, raise an exception. Oh, I did actually get a right test for that. Like it said, let's verify that if.

Brian:
Yeah, it did verify that. But then there's conditions around whether or not the owner is supplied. And it didn't test for that it also doesn't really test to make sure that the stuff actually ended up in the database but.

Anthony:
I'm curious actually to see what happens if I run the test I thought I'd do it wrong, Ah, you've got some extra parameters.

Brian:
It looks like a PyTestCov in start.

Anthony:
Yeah, I'm curious to see what happens if I run the test that it made. Like, do they pass? I'm bypassing your entire test infrastructure and just calling them directly, but it might work. No, it needs more stuff. So yeah, it didn't test that. Also, the tests are pretty basic. Like, what it generated at a glance looks fine.

Brian:
But that's the part that I'm worried about that people relying on this to write their tests they're going to look okay but not really be the right thing yeah.

Anthony:
So I'd argue that this is better than no tests but that's a pretty low bar right.

Brian:
I don't know if it's a low bar actually I think tests that at least exercise your code is better than nothing but right you said Like you said, that's a low bar. You're not making sure that it's working correctly. It just doesn't blow up.

Anthony:
Yeah. So this is like scenario use case one is that AI can write your tests for you. And yes, it can kind of do that, but it will give you a very basic test suite. And you're skipping the important part, which is actually thinking about what you want to test. You're skipping the thought process. and you're just letting the AI do that for you based on other code it's seen in the past.

Brian:
Yeah well so can you ask it like given this stuff what sort of stuff should I test.

Anthony:
Exactly so like that's the scenario I call like don't hands off the keyboard it even suggests actually it's like hey let's generate tests for this I think if you use that feature then it gives it a better prompt, they've been fine-tuning this so like.

Anthony:
Hmm that doesn't have fixtures yeah it's a similar thing it's making some tests, but because I only picked one method and said make tests for this single method then it's created one two three three tests and then it got bored and stopped so like if I tell it if I if I tell it to make 10 it'll make 10 if I don't tell it how many it will make like two or three and then it will just get bored so you generally want to be specific with like how many of those you're expecting and if you ask it a question so like if I ask copilot if I just clear the chat as well this is another thing this is another trick that's really important is if it's going down the wrong path don't keep asking it different questions differently because it uses its conversation as its context like if it gives you it will use wrong answers as context and say, we'll just kind of keep going down a path that you don't want it to go down. If I said, I want to write some tests for cards, DBs, card method, I am a really good tester and I want to create amazing tests. What should I do?

Brian:
So is that I'm an amazing tester sort of thing is that stuff important.

Anthony:
Annoyingly yes so I now can show you another trick which is because at the at the moment the way that they're built is that it will use it doesn't know context so like I'm I'm a noob this is my first python program I should I want to create a test like that as a question is different to someone who has got a lot of experience with python and testing asking the same question that's um I guess and so if you're a beginner and you are like I want to make a test it wouldn't be like okay let's use fixtures and parameterized tests and let's like test all these scenarios and let's introduce all these new concepts that would be lost immediately um whereas if I ask it the the issue I had with the like it just generating test for you is that you just skipped the thought process and it gets you to a green tick and pie test okay you might have missed some important things whereas if you ask it the question and in my question I've kind of given a hint as to what level I am actually that's not what level I am but you I'm not a really good tester but if you kind of give it that as context then it will give you a better answer because if I just say I want to write tests it will give you some like textbook computer science verb tests are all about verifying yeah I.

Brian:
Actually kind of like this output so it, it's not overly verbose it's good.

Anthony:
Yeah so it said let's create one with a valid summary let's create one missing summary and let's assert that the exception was raised let's create one with a none owner so that's the other condition you you spotted let's create a card with a specific owner.

Anthony:
And it's kind of following the same like the triple a pattern and it's saying let's create one like for test scenario four let's create a card with a specific owner add to the database fetch it from the database with the id and then assert that the owner fetch of the fetch card matches the specified owner so I'd argue that like test three and four are actually probably the best ones like they're actually checking that when you add a card to the database it actually adds it and when you fetch it back out again it gives you kind of what you wanted, so if you scroll it even though I didn't ask it to it's gone and made those four bullet points into tests and test scenario three and four are test methods so it's got one here make one with the owner add the card fetch the card and check that the owner is empty so it's kind of saying if you don't specify an owner in your code if the owner is none then make the owner an empty string it's actually written a test for that and then the final one is it's gone okay let's create a card let's add it let's get it back from the database and let's make sure that the owner actually matches what we put in in the input at the beginning it's not bad.

Brian:
Yeah, so I like this. So the model really isn't write tests for me, but it might write tests for you better if you don't ask it to. If you say, what tests should I write?

Anthony:
Yeah, and you tell it that you're like...

Brian:
I'm Brian Akin.

Anthony:
Yeah, I'm going to also see what happens if I... So it kind of used the slash tests command behind the scenes and it also used some of the things I'd included in my prompt which is like what level am I at what are my expectations and I said like I'm a good tester and I want to create amazing tests so it's it's you shouldn't personify the AI but it just makes it easier to do that it just makes it easier to explain things sometimes when you do I think it's like it's trying that little bit extra harder I think I.

Brian:
Might do this the prefix of I'm an experienced developer more because sometimes when I just want a quick answer to something I just pop in a quick question and I get like an encyclopedia entry.

Anthony:
Yeah yeah I hate them it's.

Brian:
It's like it's like the guy at a party that is really into like something and you ask the wrong question and it's a wind-up doll.

Anthony:
Um but.

Brian:
Anyway would you recommend people do this then the what what test should I write method.

Anthony:
Yeah yeah I think so I like this I think I feel like if you're if you're not writing tests today or there's like some piece of your code that you want to go and create some tests for then like this helps you get started so like yeah I'd call this like scaffolding like test scaffolding where, you know you've got nothing today, and so like if I can just if I delete this and I paste that in so they're going chuck in the test that it made, and then we save that and.

Brian:
You will have to pip install tinydb.

Anthony:
Okay if.

Brian:
You want to.

Anthony:
Okay.

Brian:
And hopefully they haven't changed their database again.

Anthony:
So yeah, that's kind of like the holding use case. And then the one that I use it for a lot more than this is I am in the middle of a project. Actually, I'm going to go and pick your test code because I think this will be better. Okay, so we've got, this is where we do completions. Let's see if we can go to run our tests. Oh, Rich. See, this is why I should actually be just installing your requirements. This is just guessing them one at a time.

Brian:
Well, that's one of the reasons why I might rewrite the book is so that I have a project with no dependencies.

Anthony:
Okay so in this one you've got test what are you testing you're testing that if you add a card and then you run the finish method that the state becomes done that's what I'm understanding from this yeah yep are there any scenarios that we missed do there any is there any like additional tests that you think we should you should add here well.

Brian:
I don't because I intentionally made it 100 test coverage and behavior coverage but yeah.

Anthony:
Okay listen so the other scenario is like as you're writing tests it predicts the next test for you oh.

Brian:
Finish twice yeah cool.

Anthony:
So I'm gonna I'm I'm just going to go down this path and see where it leads us. So now the important thing is here that I haven't done anything other than press enter. And it thinks I want to make a new test called test finish twice. And it knows that I'm going to probably want the cards DB fixture because that's what the other tests do, right? It's going to write the dock string for me. And should you be able to finish a card twice? I don't know, Brian.

Brian:
Well, I made the decision that you can, but it's actually tested above that you can.

Anthony:
Okay. Finish. Oh, like if it's already done and you do finish, then it should always stay as done. Yeah. I should have picked somebody else's code that wasn't Brian O'Connor but.

Brian:
Actually one of the reasons why I did that is because I wanted to talk about, when writing tests is a great way to think about your requirements yeah like this is a question about the application should you be able to finish something twice don't know and.

Anthony:
Should it raise an exception if you do, so like let's see what it's done So.

Brian:
It's assuming that it should raise an invalid card ID if I try to finish a card.

Anthony:
Second time.

Brian:
Yeah. Well, I mean, if that was the decision to not be able to, that'd probably be correct.

Anthony:
Yeah. So it will generally do things like this. It uses the context. It looks at the test you've already written. And it doesn't write them in a generic way. It writes them in the way that you've been doing so far. Even with the doc string.

Brian:
That's nice.

Anthony:
Yeah and it will use your like it will like the fact that you finished your doc strings with a full stop that time like it copies that so it just like I find this is super useful when you're when you're building out tests and you've kind of written one and then you want to test that slight variation and that would often require you just writing a lot of testing over and over and over again.

Brian:
My daughter tells me that ending a sentence in full stop means that I'm angry.

Anthony:
Depends how hard you hit the key, whereas if we wrote the code comment trick if you're basically giving it instructions what test does this, the other option is that it will know based on the name of the test function, what it is I'm trying to do so if I say test finish, twice is okay then it will change what we did last time and it knows now based on the name of the function so this is like this is my like super lazy guy to testing I basically just write a name of a pytest test with the thing I want to do and then it most of the time just writes it for me this.

Brian:
Might get people to write better test names.

Anthony:
Yes that's.

Brian:
A good thing that's awesome I like.

Anthony:
So yeah it's created a card it's called the finish method twice check the id and it's checked the state is done so like that is I just told it just based on the name of the test then what it is I wanted to do and it did it did it for me so yes I guess a roundabout way of going the two scenarios I think this is useful is getting it to get you to think about what it is you want to test if you ask it and you tell it that you're a super awesome programmer and you get it to think about the testing then it will give you a reasonable answer to get you started and then once you've done that if you either just start a new line it will suggest another test or if you write the name of a test with what it is you want to verify then it will actually, implement most of it or all of it yourself this.

Brian:
Is cool before we like wrap it up for today though we didn't really do an introduction where can people find you if they want to find anything more out yeah.

Anthony:
I'm on mostly mastodon and blue sky these days and we'll leave links yeah yeah and I think I also have a blog as well so so they occasionally write on that so there's some stuff on my blog I've also got a book if you're interested in c python internals the compiler stuff like that then you can check that out yeah.

Brian:
It's an awesome book so cool we'll talk later.

Anthony:
Cool thanks Brian.

Creators and Guests

Brian Okken
Host
Brian Okken
Software Engineer, also on Python Bytes and Python People podcasts
Anthony Shaw
Guest
Anthony Shaw
Python obsessive at Microsoft. PSF Fellow. Creator of VS Code Pets
The role of AI in software testing - Anthony Shaw
Broadcast by