Progressing to the next level causes inconsistent command processing

Seuntjie · August 13, 2023, 7:08am

I have a question about advancing to the next level. The engine code basically goes:

For the first 3 commands in the queue
{
	if this bot has enough score to progress
	{
		progress this bot
                return;
	}
	else
	{
		if there is no bot
			return
		else
			do everything else
	}
}

(https://github.com/EntelectChallenge/2023-Cy-Fi/blob/96c851eae4b71d0bf29aab5ae8d857f8638526c8/2023-CyFi/CyFi/CyFiEngine.cs#L199)

There are a magnitude of issues that I’m seeing:
1: the engine checks whether there is a player connected to the command after dequeueing it and after the player object is already used. And if it does find a command without a player, it can crash the game loop, and if it doesn’t, it completely stops processing commands for that tick instead of skipping that command and doing an extra command. Meaning if there is a command without a player, every command after that gets skipped for this tick. This is also related to an additional error I’ll discuss below.

2: If a player has enough points to progress to the next level, it only happens on the next tick and only when that player sends a command. Shouldn’t this check happen after the collectibles for this tick has been collected?

3: If a player advances to the next level, all commands after that players command gets skipped for this tick, and stays queued for the next tick, meaning those players loses essentially 1-3 ticks depending on how their bots respond to missed ticks. (IE I progress and my command is second in the queue, 1 command works fine, second command gets ignored and I progress instead, third command gets skipped, 4th command isn’t processed to begin with.). This also means that if multiple players have enough points to progress on the same tick, only 1 player can progress per tick.

4: if there is an invalid action, invalid player, or a player progresses, the physics are not updated and the game state is not sent to the bots. Meaning the next tick, that states are going to be a mess for any command that was processed the previous tick, since the physics never updated for that tick. (this might explain some other issues people were having where a state transitions from a jump into a fall immediately etc)

as a result of the above, it looks like between when a player collected enough points to progress and when it gets the gamestate that it has progressed is no less than 2 ticks and can be infinitely long.

Additionally, on the runner hub, where a command gets queued:

 if (!engine.CommandQueue.Any(c => c.BotId == command.BotId))
            {
                engine.CommandQueue.Enqueue(command);
            }

This performs only a check to see whether a command for that bot it is queued already, it does not check whether a bot with that id is registered. This means a bot can spam commands with random IDs and they will be queued. In conjunction with the above mentioned issue about checking the player, this means that a bot can essentially cause the engine to skip ticks. While it might not necessarily benefit from it directly since its own commands would also be skipped, it could have an extremely detrimental effect on other bots. Alternatively, if a bot progresses to the next level first, it can essentially spam commands to prevent anyone else from progressing, ensuring a first place for that match.

I will do a pr that should address these issues shortly

Seuntjie · August 13, 2023, 7:40am

PR: Fixes for exploit and tick skipping by Seuntj1e · Pull Request #14 · EntelectChallenge/2023-Cy-Fi · GitHub

Seuntjie · August 13, 2023, 9:55am

Upon investigation, the behaviour gets a little bit weirder even.

since the returns exit before the tick gets incremented, the engine ends up processing the same tick multiple times. Thus a player might be able to cause strange behavior by stacking states (send a valid command, send a fake bot command, send another valid command 150ms later, both valid commands gets executed for the same tick because the first is no longer in the queue).

A player might buy themselves time to calculate their move by sending a few fake commands immediately and causing a tick to be delayed until they are ready to move.

Or a player could ensure their command always gets processed for the tick by spamming fake commands until they send their actual command (since the counter gets reset when a fake command crashes the loop and allows theoretically infinite commands to be processed per tick).

Other than the possible exploits, the fact that a player advancing to the next level causes the loop to exit without finishing the loop causes inconsistent command behaviour and tick timings.

Jenique · August 19, 2023, 11:45am

Hey @Seuntjie, Thanks for pointing this out! You’ve obviously spent a bit of time digging into our code. I’ve looked into your concerns, and I think the issues you’ve pointed out are not as server as suggested. Allow me to explain

We do make sure that the command has an ID attached to the bot.
playerObject = cyFiState.Bots.FirstOrDefault((bot) => bot.Id.Equals(playerAction.BotId));
https://github.com/EntelectChallenge/2023-Cy-Fi/blob/475de1906e28c285fbaaca0696fda7b822c7cffc/2023-CyFi/CyFi/CyFiEngine.cs#L188
before we even make the actual return check (this was probably left over from a previous iteration) https://github.com/EntelectChallenge/2023-Cy-Fi/blob/475de1906e28c285fbaaca0696fda7b822c7cffc/2023-CyFi/CyFi/CyFiEngine.cs#L202.
Due to this your bot absolutely must have a corresponding ID in the payload. We cannot do anything without knowing what ID belongs to the player. The game will break if it is not found. I have already some testing to ensure this but you are welcome to try for yourself and let us know.
While I don’t disagree with you about the level advancement. We are not too concerned about the logic since the player can calculate how many collectables they need advance, and send an extra command quickly afterwards, once they have reached the number of collectables they need.
In the AdvanceToLevel() method we have bit of code that removes all the players queued commands. https://github.com/EntelectChallenge/2023-Cy-Fi/blob/475de1906e28c285fbaaca0696fda7b822c7cffc/2023-CyFi/CyFi/CyFiEngine.cs#L260. They will lose those ticks anyways since the bot will not know how to react to the next map. If their commands stay queued they would potentially run off the platform. To your second point. You pointed out later in your post that the tick only gets incremented at the end of the loop https://github.com/EntelectChallenge/2023-Cy-Fi/blob/475de1906e28c285fbaaca0696fda7b822c7cffc/2023-CyFi/CyFi/CyFiEngine.cs#L234. So the players would advance in the same tick the only risk would be to wait another 150 real world milliseconds but the in game time would be preserved. I agree with you that a continue would have been better here https://github.com/EntelectChallenge/2023-Cy-Fi/blob/475de1906e28c285fbaaca0696fda7b822c7cffc/2023-CyFi/CyFi/CyFiEngine.cs#L198C27-L198C27 but there is no real in game consequence. So I don’t think you need to worry too much : ). State would also be preserved since we only update the game state at the end of the for loop
So you are probably pointing to the 3 return checks in the game loop. For the invalid player action this check was just a formality since a TryDeque() does not let you return a nonnull value, but we do a check in the runnerHub to ensure the value returned exists before it is added to the queue. invalid player is also left over code and player progression as mentioned will not advance the tick until the end of the gameloop method. So no player would be disadvantaged
You pointed out really interesting exploit here! I didn’t even thing about this! However, as I mentioned the game will break if you send an invalid ID so everybody losses if someone were to try to implement this kind of malicious code.

While I agree that we can keep an eye out for these things in the future. We never assume mal-intent form our community. You’re a prime example pointing this out for us to look into. We will be sure we keep our code clean.
However I think with the current implementation we are safe

I think for T2 at least we will keep game loop as is but if there are any bugs you find with it at the moment let us know! If I am wrong and you are able to create an exploit bot with your findings then we’ll definitely try to resolve it.

Seuntjie · August 19, 2023, 1:05pm

Hi @Jenique

I agree that the behaviour from my initial post was not consistent with the tests that I performed later on, but I do want to point out a few things:

Regarding point 1:

You do check that the command has a player attached to it (line 188), then you use that player object to determine if they should advance (line 192), and only after that do you check whether that player object is null or not (line 202), thus you will have a null exception thrown in the loop. Because the loop is running asynchronously, stops processing and that thread is terminated, but the exception is never raised to the main thread, so the application continues executing. (if running the server from VS, your VS will catch the exception but when running without a debugger attached, it will just continue to run).

Using the latest version of the engine, I added this to the top of my command:

connection.On<BotStateDTO>(
                            "ReceiveBotState",
                            (gameStateDto) =>
                            {
                                for (int i = 0; i < 15; i++)
                                {
                                    connection.InvokeAsync("SendPlayerCommand", new BotCommand(Guid.NewGuid(), Domain.Enums.InputCommand.DOWNRIGHT));
                                }

And that resulted in the below output from the server:

So as I mentioned in my reply, a user could do what I did above to buy themselves time to respond to the state, get multiple commands processed or ensure that their commands get executed every tick (they would essentially ensure every bots commands gets executed every tick probably). Alternatively they could indefinitely delay the game and just rack up costs for the team.

Regarding point 3: The real game impact is that extra commands can get executed for that tick. If you have a queue of 4 commands at the start of the tick and the 3rd command is the one that advanced a player, command 1 and 2 gets executed and dequeud, but the physics are not updated. Command 3 progresses the player and exits the loop without updating the physics, but now the queue only has 1 single command from the last player that should not have been executed. If player 1 and 2 runs on a timer instead of responding to game state received, they might submit another command each. Thus, when the game loop runs the second time for that same tick, it dequeues another 3 commands, the first of which is player 4, who gets their command processed, and then player 1 and 2 get their second commands processed as well. Now you’re sitting with stacked states where an incomplete move might get affected with additional actions. Though, I haven’t dug into the statemachine enough to see whether there are movements that could be benefited or hindered by stacking states (I suspect jumping might be affected). Even if the command is essentially ignored because of no conflicting states, bots might fall into subroutines because its state is inconsistent with what it expects. If player 3, that progressed, also queued a new command within those 150 ms, it will get executed for them on the new map without them knowing they are on the new map.

It feels a bit like I’m making a lot of fuss about nothing, but I know that my bot keeps very strict track of what it’s state and position is vs what it’s supposed to be, and having anything that could caused commands to be processed inconsistently (like state stacking or commands being executed seemingly late) could severely hinder performance, and I’m sure other bots will have similar issues.

The only real changes needed to address my concern is that when a player submits a command to ensure that to bot id in the command is a registered bot (as in my pull request) or to get the bot id from the connection rather than the one passed by the bot, and to change the return in the player progression to a continue.

Edit: I know that theoretically with the player progression, there is a maximum of 12 ticks out of the 10 000 that could be affected, but it only takes 1 tick being process inconsistently to cause a bot to fall off of the map with no way back.