Age of Empires and networking

Recently, I’ve been playing a lot of Age of Empires 2: Definitive Edition with friends (it must be a side effect of being stuck in the flat during a pandemic and being - voluntarily - jobless).

After an intense match which included epic battles, I started to wonder about how all this is put together under the hood. I mean, when I select a group of paladins and instruct them to attack a castle, where is this action acknowledged? Is there a central server that approves this, kind of like commiting a transaction in a database? And when I start constructing a building at the same time my friend starts to construct something on the same spot, how is it resolved? When you have 100+ villagers gathering food, wood and gold, and an ever-growing, always busy, 300-strong military - how is all that state kept up to date anyway?

I was interested in the consistency guarantees and conflict resolution strategies that such a game engine has to carry out. I have no idea about game developement whatsoever, but I have some knowledge of distributed systems and databases, and so I tried to think about how the internals could be done. Obviously my initial thoughts were quite naive and this turned out to be a deeper rabbit hole than I imagined - but in a good way.

Networking in real-time strategy games turned out to be a super interesting and complex topic, so right now I will summarise some things I learned, without the desire to go very deep, and I will most probably come back to the topic and do another post about it.

# TCP or UDP?

I was fairly certain that a real-time strategy game must use the UDP protocol as opposed to TCP, due to the constraints of, well, being real-time. I don’t know too much about networking, but I do know that what makes TCP attractive for most scenarios (guaranteeing a reliable connection over unreliable media, guaranteeing ordered packets, providing flow control and congestion avoidance) is simply not a requirement in a game like this, or rather, other requirements - speed, mostly - are much more imporant than these. In streaming, near-real time programs like games and video conferencing, you don’t care that much about a lost packet, since we don’t really care about a few lost pixels or even frames in a video call, we certainly don’t want the protocol to resend them - like many things in life, they had an expiration date: once that time expires, you just can’t do anything with that information, since you have to move on, you have already moved on (how philosophical, yet true).

This great article goes into the details (without being too low-level) of the TCP/UDP comparison for a game.

Yet there are certain actions for which a gamer most probably want total assurance - frantic clicking aside, it should be enough to click Attack once so that your unit attacks, and you should not have to repeat that action if the UDP packet happens to be lost. So I wonder how exactly that is controlled - there must be some acknowledgement and some kind of delivery guarantee implemented on top of UDP (this article talks about the details of it, and this famous article mentions this as well).

There is an article about the other side, mentioning that blindly choosing UDP is a premature optimization, not recommended until later in the development process. Here is the article in question. It goes into the nuances of where to use TCP and UDP and gives good examples.

# Client-server or peer-to-peer?

I was wondering about whether this kind of RTS game uses a client-server model or a peer-to-peer topology.

There is an absolutely fantastic article about this on Gamasutra (here) which reveals that for very specific reasons, AoE 2 used a peer-to-peer architecture. This meant that there was no central server through which game data flowed.

Even if it’s peer-to-peer, there might have been a host machine which is elected to manage the game state - I need to look into this aspect.

# A world of concurrent simulations

One of the most fascinating aspects of the article from above is that it turns out that when you play Age of Empires in multiplayer, each player is running a simulation of the game on their own machine. This might be a no-brainer for others but it was a novel idea to me. I understand that FPS games and such must pass the minimal amount of data around (coordinates and actions, mostly) and that all the environment is simulated locally, I get that - but for a strategy game like Age of, it struck me as strange, as there are loads of moving units at any given time.

The style is rather imperative: commands are shared throught the network (not game state), everything else is simulated locally. This is in contrast to declaratively passing the game state over the network, which would have been too much data and would have limited the unit count to around 250 total. So clients would synchronize their watches at the start of the game, and then time-stamp each command before sending them over the wire, thus synchronizing the commands and indirectly the game state.

This is called a deterministic lockstep and is based on the deterministic nature of the game - meaning that given the same inputs, the game state should be exactly the same in each client’s simulation. Randomness (non-determinism) is allowed, but then the clients have to use a random number generator with a pre-agreed seed, so their RNGs produce the exact same pseudo-random output for the same environmental units (eg. foraging deer, etc). Otherwise, a seemingly small difference can mean a significant drift on the long run of a game as it progresses (this was apparently a bug discovered in the early phases of testing the game).

This might not sound such a difficult problem to solve, but if, for instance, you think about inaccuracies in floating point operations, we can see why a small difference might lead to a large discrepancy during a long game (floating point math is sometimes outright banned from the game’s operations - see here - even though on the same hardware floating point math is deterministic, on different hardware and compiler settings, it might not be the same). For this reason, the game state is periodically checked with the use of a checksum, and if clients’ checksums differ, then the game is out of sync, and it must be aborted.

Another interesting note is that time in the game is sliced into communication turns and each command is scheduled for execution 2 turns later (a turn would usually be 200ms so this is still pretty real-time). In addition, there was a dedicated speed control implemented which catered for scenarios where different players had very different hardware - to consistently run the concurrent simulations, the game had to run only as fast as the slowest machine could render it. So the turn time was adjusted continually as the match evolved.

(A side note for the article is that one of the key take-aways was “Metering is king”, which still stands very true in todays distributed software world.)

A tangentially related topic is dead reckoning - the estimations of a unit’s current position based on data points from the past, like position, heading and speed.

I don’t think it is used in Age of Empires, given the deterministic engine and the logic of the lockstep protocol. There is no need for estimating a unit’s position because the position is determined by the command that moved the unit and the surrounding environment, and any new action affecting that unit is synchronized with all clients in the lockstep mechanism, in every game turn. Also, while dead reckoning is declarative (transmits unit state), deterministic lockstep is imperative (transmits command), as we have seen.

Dead reckoning is more heavily used in faster-paced games like FPSes and racing games, where it’s combined with a rollback networking strategy, which means that after the data is received about the other player’s position, the unit (car, person, etc) might be rolled back from the predicted state to the received state (which explains the occasional “teleporting” which sometimes occurred when we were playing Need for Speed 3 on a dial-up modem in the old days).

Concurrent simulations are very interesting from a philosophical perspective as well, especially if you think about all the “glitch in the Matrix”-like experiences people talk about sometimes - think déjà vu, a kind of eventual consistency in real life, or events which we have seen happen with our own eyes but then all evidence shows otherwise later on - a kind of timewarp or roll back of our world (see this and this about timewarp in distributed games).

This is a fascinating topic and I’ve just scratched the surface here - I’ll try to read more of these resources and I might come back with another post about the findings. If you’re interested, dig into the papers below!

# Sources

# Main sources

# Further resources

Written on December 19, 2020

If you notice anything wrong with this post (factual error, rude tone, bad grammar, typo, etc.), and you feel like giving feedback, please do so by contacting me at samubalogh@gmail.com. Thank you!