Some insight into the mysterious timeouts

This isn’t super important to the upcoming tournament, but I thought I’d point out something I noticed in case the devs who have thought about this mystery in the past are interested.

Last year and this year, people have commented on the fact that the game runner sometimes points out, after a match, that there were timeouts on certain rounds. This would raise a “requires attention” on the player portal last year, but since the bots’ moves were accepted despite the timeout being marked, no one really worried about it except to check that it wasn’t a problem. Everyone got on with their coding.

This evening I was testing something and turned the max-runtime-ms in game-runner-config.json up to a few minutes. I played a game between bots that respond in tenths of a second. One some rounds (but not all - no idea why these rounds and not others), both bots would run and output their answers to stdout (viewable in the console).

But the game engine would hang up for the amount of time specified in max-runtime-ms!

Then, the moves would be accepted and the game would continue as normal.

I suspect this is happening all the time, but people don’t notice because the moves are accepted, the normal timeout is only one second, and with everything being written to the console a second’s delay is not significant, or they think the delay is in their bots.

I don’t know why it’s happening and haven’t even looked at the game runner code yet (maybe after I’ve handed in my entry for the round) but I thought it might help the devs if they are interested in tracking down the bug in the game runner that causes this.

To reproduce: set max-timeout-ms to a value high enough that you’ll notice (eg. 60000), run two fast bots against each other, and watch the long pauses in the console.

1 Like

I’ll go out on a limb, and make a guess at this:
might be that the request timeout is set to the the value specified as command timeout.
Then, every now and then, for whatever reason (sun spots, CIA eaves dropping, AI rebellion, neutrinos, bad magic) , the EndOfMessage token is not received by the runner (it DID receive the actual command though).
So then it sits, with the command pulled down, but waiting for the timeout to kick in. Eventually it does, network errors are handled (ignored?), command received is processed.

I’ve seen this happen with HTTP comms way to many times in the past (usually with mobile clients though).
Server receives request, processes request, sends reply…but requestor never receives reply.
100% reliable networks comms do not exist unfortunately.

This is , of course, only speculation from me, based on current observations and past experience.

@Malman, my observation: When running bots that did the absolute minimum processing (e.g. both bots just keep on sending the Accelerate command), every now and then the output will hick or freeze a bit, and then continue. This behavior got worse when I ran a game and at the same time in the background deleted old log files en-mass.
So, I was thinking the pauses/freezes/hiccups were caused by disk IO waits or bottlenecks?
I tested this theory by changing my log file location to a faster disk, and yes, almost none of these pauses experienced on the faster disk.

1 Like

Whahahahahahaha :rofl:

Now go into game-runner-config.json and change max-timeout-ms to 60000 and watch the “disk” pauses mysteriously increase in length to exactly 60 seconds!

Did not work for me. I set the max-runtime-ms : 60000 → did not make any difference. Also tried the request-timeout-ms : 60000.

I then kicked of 10 games at the same time, while monitoring my PC performance in Task Manager. CPU was coping but I had regular pauses at the same time the Disk (where my logs go) went to 100%.

The disk IO lag sounds very plausible, since the game-runner would wait until it finished writing the current round log file for each bot, before informing those bots that its time for another command.

This would obviously require much more investigation to get to the bottom of this.

This topic was automatically closed after 8 days. New replies are no longer allowed.