20mb Bot Submission Limit

MarcinK · July 10, 2018, 2:38pm

Hey guys
I was wondering if we could have a discussion about 20mb max bot submission size?

You’ve mentioned before on these forums how you don’t mind TensorFlow being used, and from what I understood you’re not against deep learning techniques. But the 20mb upper limit makes it basically infeasible to use deep learning networks for AI.
Of course we could modify our networks so the final serialised model fits within 20mb, but at that point we might as well just upload a bot which picks actions randomly each turn.
Are you looking instead for classical reinforcement learning approaches (ie., without deep learning)? The game seems too complex (in my opinion) for that to be effective but I might be wrong

I understand that of course Entellect has resource constraints to deal with, but I was hoping we could come to some compromise about the submission size - especially since (I’m guessing) the majority of submissions won’t be using more than 20mb in any case. I’m sure you have data from the previous round about how submission size was distributed so far.
I’d like some input from other people participating about what a reasonable file size could be, but 150-200mb would be more than enough in my opinion for an effective deep learning network

I hope we can find a solution that makes all of us happy
Marcin

linjoehan · July 11, 2018, 9:28am

If you are submitting a zip it should be compressed though?

MarcinK · July 11, 2018, 11:03am

not nearly enough. general purpose compression algorithms don’t perform well on neural network weightings

GeelKanarie · July 16, 2018, 5:09pm

Hey @MarcinK,

Apologies for the delayed response, we were having a chat among the team trying to find a solution.

The best we can do currently is upping the submissions to 75mb for various performance and budgetary reasons.

We have chatted with our resident AI person and he believes it should be possible to refine a deep learning model down to around 50-75mb but then again, as I understand it this largely depends on the algorithms and learning techniques you are using.

Hopefully, this will help you and some other guys out though and hope to see your entry in the tournament.

MarcinK · July 17, 2018, 6:25pm

Thanks a lot, I appreciate the response. I’m really grateful that you’re able to increase submission sizes up to 75mb. And it adds to the challenge to have to stay within the 75mb limit

So over the weekend I came across another issue with the game engine, but I can’t really see a way around it without a bunch of extra work on your guys side. I don’t know Java well enough unfortunately to try implement it on my own and send a PR through

Any python program (from what I can tell) using a framework like Tensorflow or Theano has a ~1s initialization time while the libraries are loaded into RAM, and then the first time the model is called another ~0.5s. From what I can tell this will happen no matter how deep (or how simple) the network is. So a simple network of mine (on an old laptop without any GPU acceleration) has consistently sub-1ms inference times (i.e. loading the current game-state and predicting the next move that should be made). But the very first inference will always take about 1.5-1.8s while the model’s compiling. With the game engine as is, every round the bot programs get launched, they write an action and then they close. So basically every round the bot would be going through that 1.5-1.8 second initialisation, predicting the next move, and then closing. I tested it on an older laptop and I never went over the 2000ms turn limit, but it’s pretty close 🤷

So a solution I thought of that should work across any language is, enabling a mode where bots are launched at the start of the match as a long-running process that stays open until the end of the game. Then as the round changes, having the game engine modifying the state.json in each bot’s folders as it does right now. Bots would watch the state.json file for changes and take that as a signal that there’s a new round, and to submit a new action. So the game engine could start the 2000ms turn time-out from the time that it modifies the state.json
For backward compatibility (with other bots that don’t need to stay open for all match) you could enable it as a preference in the bot.json files so other people don’t need to change their bots in any way.

I do understand though if you might find it not worth the benefits to implement something like this.

gustav · July 17, 2018, 7:27pm

Marcin, the hassle with the bot staying resident in memory is that it will then consume CPU which the opponent bot needs for its turn. Part of the challenge over the years has been to use the 2 seconds allocated as efficiently as possible. This is one reason why you maybe do not want your model to grow too large, since it will then take too long to load and compile. People using other approaches also has to be careful with the 2 second limit.

Things has actually gotten much more lenient in the past couple of years. For the first couple of years, not completing within the time allocated meant an immediate loss. Fortunately for the past couple of years, it only meant a null move.

MarcinK · July 17, 2018, 7:47pm

Of course, I understand 100% understand if the Entelect guys would be hesitant to make a big change like this to the engine
It’s just that during all that time the bot isn’t actually doing anything other than loading libraries, not matter how complex or simple the actual model is. I skimmed google but I couldn’t find much config options for reducing that time, maybe someone else could help.
For reducing the final serialised models size I found Tensorflow Lite reduces the model size and inference time a good amount at the cost of a bit of accuracy

GeelKanarie · July 18, 2018, 8:32am

Hey @MarcinK,

So hopefully I can set your mind a little bit at ease here.
We are running what we call a “calibration bot” prior to each match.
This is an empty bot in the language that you are using with your libraries to basically time the initial startup of a language and then add those precious milliseconds, or even seconds, to each round, as a way to negate this, because we cannot really expect anyone to optimise something like Tensorflow’s startup time .

MarcinK · July 26, 2018, 3:17pm

Hey guys, sorry to be bringing this up again. It’s just been bothering me a bit while finishing up my bot for the upcoming round.
I do appreciate you including calibration for loading TensorFlow, that’s a big help. But a big part of the initial runtime for even a super simple network is the very first prediction. I’ve written up a quick script with a trivially simple network to try illustrate it. I’d like to ask you guys to consider including the delta between first prediction and second prediction in the calibration time, along with the library import time

calibrate.py
These are the results on my computer

1.862093 s to load TensorFlow

1.195996 s for first prediction
0.000687 s for second prediction
0.000634 s for third prediction

linjoehan · July 26, 2018, 3:51pm

Surely that’s not fair?

Why would extra time be given based on what language or library you use?
Would someone get the extra time if they just add the library and don’t use it?
What about extra time for using similar libs in other languages?

Or for example I’ve tried to move my data from a file into my code to avoid time lost reading from file, can I get extra time for file IO then?

GeelKanarie · July 27, 2018, 7:43am

@linjoehan I understand where you are coming from, but I’ll try to address your concerns.

So we are not giving them extra time during the running of their bot, because we are starting the timer to time you round before your application starts up, we need a way to give everyone the 2 seconds that their bot is allotted, regardless of language, which is what the calibration bot intends to accomplish. Please let me know if you still have any questions around this.

@MarcinK Unfortunately we cannot do that, this is starting to go into the realms of optimisation on the user’s part and would very possibly give an unfair advantage to users of Tensorflow. Remember this challenge will need you to weigh up the pros and cons of using one strategy versus another one.

MarcoWillemNel · July 27, 2018, 7:49am

I kinda agree with what linjoehan is saying…

I don’t want to prevent people from competing with certain languages, but i still think it would be wise to in future have different category if you use stuff like tenserFlow.

Because we can’t have a situation where if you use certain language you get certain perks or advantages just because you are using that language.

Unless the person knowingly wants to enter a certain category where it is not meant to be really competing.

MarcoWillemNel · July 27, 2018, 7:50am

saw the post only after i posted mine, i agree 100% with you @GeelKanarie

MarcinK · July 27, 2018, 8:14am

I understand, thanks for considering it

sentient · July 27, 2018, 8:58am

Give Microsoft CNTK a try as an alternative to TensorFlow for the NN backend in Python. It is definitely on par with Tensorflow and has a faster startup and inference time. Also, checkout Intel neon, it’s training and inference is very fast.

The load time of TensorFlow 1.9 has improved somewhat.

I do as much as possible on a separate thread (loading and parsing state.json file and general initialization) and import Tensorflow in parallel, so at least once TF is loaded, you are ready to immediately get going.

GeelKanarie · July 30, 2018, 7:37am

Hi Guys just a quick update.

The upload submissions have now been upped to 75MB.