It's one thing to build and test a multi-player game, it's an order of magnitude more challenging when you want to build and test a game capable of 10,000 players in a single battle - but testing a game of this magnitude is precisely what we've had to do, and it's taken some creative thinking.
Testing is common sense for any developer, but when you're trying to achieve something that has never been done before it is even more critical that the system is battle tested.
As Aether Engine is a distributed simulation engine it borrows many techniques from high-performance computing, and it's crucial for us to learn how the system behaves and performs when under significant load. Specifically, we're looking for the following types of information:
Developing tests for the above is straight forward for the majority of transactional based applications, but running this for a single persistent world where 10,000 players have the ability to interact with each other poses some unique environmental considerations that must be factored in.
As you may have seen this week, we have partnered with Microsoft and are using the performance and scale of their Azure Cloud to provide the backend compute for our 10,000 player deathmatch.
We're extremely excited to have the backing of such a battle-tested ecosystem, but it's easy to get a false sense of confidence in the player experience if we don't carefully consider what the infrastructure looks like between the the stress test and the players.
Netflix is probably the best known advocate for wanting to know bad news early, having developed the chaos monkey to deliberately disrupt their live systems to see how they respond.
While we didn't use Netflix's suite of tools, we are advocates of the bad news early mindset. Taking this approach, we considered multiple options for the best way to achieve player scale and chaos. Options included:
Then we asked ourselves: what if we created player bots as Amazon Lambdas and spread them across multiple regions?
It would give us a reasonable degree of network uncertainty (internet traffic vs datacenter), geographic diversity, and it would remove the safety and confidence that we have in our own Azure set up.
We decided that this would be the best option for us as we could scale up indefinitely and put Aether Engine through its paces (we ran a test for over 50,000 players, but that's for another date).
We decided to use 7 Amazon Lambda regions (3 x EU, 2 x US and 2x Asia), as this would give us the geographical and bandwidth diversity we've observed in our player sign-ups.
From here we went about invoking 3 Player Bot Amazon Lambdas per region, each capable of creating 500 players. (7 Amazon Regions x 3 Lambda Invocations per region x 500 players per Lambda Invocation = 10,500 player bots).
At this point, we'd created a relatively simplistic model of a player bot, and found a magic number to get us over 10,000 players with some element of randomness involved with geographic deployment and network.
Next, we had to understand the Azure set-up to deliver an excellent experience to 10,000 players.
When you're doing significant computation, you need big boxes, so we opted to use the Standard F64s_v2 (64 vcpus, 128 GB memory) series of VMs on Azure.
We provisioned the boxes as follows:
As we explained earlier in this post, our Player Bot behaviour was fairly simplistic in the early days. To begin with, we were purely interested in concurrent connection stability. In the video below you can see one of our engineers flying around the spatial simulation alongside these 10,000 player bots maintaining active connections from multi-region Lambdas. You can see there's some work needed on the torpedoes!
Since these tests we've worked considerably on optimisations and added a lot more realism to Player Bot behaviours.
In the video below you can see the live simulation from Aether Engine as the multi-region Player Bots display more lifelike activity including a hub of battle activity in the centre.
There are obvious limits to Player Bots, and we're working our way through live play tests with real players - our first 1,000 player test was last weekend and we're looking to make this bigger in the next one, coming soon!
We're looking forward to seeing how Aether Engine handles things, and what we can learn to improve the system further.
If you've not already registered for our 10,000 player deathmatch - Aether Wars - we'd love for you to be a part of it and help us to stretch Aether Engine to its full potential.
Sign up to play here.
HadeanOS is a cloud-first operating system that has been engineered and optimized for performance across massively distributed computing infrastructures. HadeanOS natively understands the dynamic scale and real-time demands of modern applications in the cloud and removes the need for complex operations and engineering.
Call 020 3514 1170 or get in touch using the form