Optimizing Game Testing with Reinforcement Learning (RL) Model

Let’s face it. No one likes encountering bugs in their gameplay. Okay, if the point of the game is to decimate a horde of insects, that’s one thing, but today, we’re specifically talking about software bugs, not literal ones. Especially with the number of games currently appearing on the market in some level of early access, it can be incredibly important for studios to nail down any game-breaking bugs as soon as possible. Otherwise, they can find themselves facing a wave of backlash from the players that paid for the title, and in an industry where reviews directly impact profit, that can be a big deal.

This is where software testing comes in. Before a new version or patch is released to your favorite title, it goes through a series of tests to ensure that everything works as intended (at least we gamers would like to hope that’s the case anyway). This usually starts with automated tests that ensure the build doesn’t have any errors, and the game displays properly under certain conditions, but there is often a second round of playtests, performed by a team at the studio. Depending on the type of game being produced, these playtests can be very lengthy, and prone to human error.

So, what if we could design something to effectively create and run these tests for us? Well, it just so happens that this is a great application for a Reinforcement Learning (RL) model. It’s possible that you’ve seen a video or two of an AI learning how to most efficiently play and progress through a game, particularly if you’re interested in the concept of speedrunning. Most of these AIs seem to struggle at first, but as the hours add up, and the AI has time to learn from its mistakes, it becomes markedly better at the game. In fact, in many cases, the model becomes more skilled than a human player. Reading the data from this type of AI model can be an excellent way to detect game-breaking bugs or flaws, since the model will often shy away from things that don’t work or gravitate towards unintended shortcuts.

The main flaw in this approach is that a human is still required to read over all of that data and identify the outliers that need to be studied more closely. So, let’s take things a step further and introduce a second layer of AI. Now, we can use a Generative AI model to read all the data provided by the RL model, identify patterns that the RL model always includes or avoids, and provide a condensed copy of that information for the game developer to read over and investigate accordingly.

This type of testing is most applicable to platformers and puzzle games, as they have the least amount of movement and input variables, cutting down on the amount of options and scenarios the RL model needs to process. However, the same type of testing could be applied to almost any game in the industry, provided that the studio has the necessary hardware to let the model scale. So, that AI model going after your speedrun world record might feel like a threat today, but it could also play an instrumental part in ensuring you experience smooth gameplay in new releases for years to come.