How to scale a NodeJS stateful application

The problem you state is too generic and it's difficult to give a concrete response. However let me be reckless and give you some general-purpose scaling advices:

  • Remove counters from databases. Instead primary keys that are auto-incremented IDs, try to assign random UUIDs.

  • Change data that must be validated against a central point by data that is self contained. For example, for authentication, instead of having the User Credentials in a DB, use JSON Web Tokens that can be verified by any host.

  • Use techniques such as Consistent Hashing to balance the load without need of load balancers. Of course use hashing functions that distribute well, to avoid/minimize collisions.

The above advices are basically about changing the design to migrate from stateful to stateless in as much as aspects as you can. If you anyway need to provide stateful parts, try to guess which entities will have more chance to share stateful data and allocate them in the same (or nearly server). For example, if there are cities in your game, try to allocate in the same server the users that are in the same city, since they are more willing to interact between them (and share stateful data) than users that are in different cities.

Of course if the city is too big and it's very crowded, you will probably need to partition the city in more servers to avoid overloading the server.


Your question is too broad and a general scaling problem as others have mentioned. It'd have been helpful if you'd stated more clearly what your system requirements are.

If it has to be real-time, then you can choose Redis as your main DB but then you'd need slaves (for replication) and you would not be able to scale automatically as you go*, since Redis doesn't support that. I assume that's not a good option when you're working with games (Sudden spikes are probable)

*there seems to be some managed solutions, you need to check them out

If it can be near real-time, using Apache Kafka can prove to be useful.

There's also a highly scalable DB which has everything you need called CockroachDB (I'm a contributor, yay!) but you need to run tests to see if it meets your latency requirements.

Overall, going with a very powerful server is a bad choice, since there's a ceiling and it'd cost you more to scale vertically.


There's a great benefit in scaling horizontally such an application. I'll try to write down some ideas.

Option 1 (stateful):

When planning stateful applications you need to take care about synchronisation of the state (via PubSub, Network Broadcasting or something else) and be aware that every synchronisation will take time to occur (when not blocking each operation). If this is ok for you, lets go ahead.

Let's say you have 80k operations per second on your whole cluster. That means that every process need to synchronise 80k state changes per second. This will be your bottleneck. Handling 80k changes per second is quiet a big challenge for a Node.js application (because it's single threaded and therefore blocking).

At the end you'll need to provision precisely the maximum amount of changes you want to be able to sync and perform some tests with different programming languages. The overhead of synchronising needs to be added to the general work load of the application. It could be beneficial to use some multithreaded language like C, Java/Scala or Go.

Option 2 (stateful with routing):*

In some cases it's feasible to implement a different kind of scaling. When for example your application can be broken down into areas of a map, you could start with one app replication which holds the full map and when it scales up, it shares the map in a proportional way. You'll need to implement some routing between the application servers, for example to change the state in city A of world B => call server xyz. This could be done automatically but downscaling will be a challenge.

This solution requires more care and knowledge about the application and is not as fault tolerant as option 1 but it could scale endlessly.

Option 3 (stateless):

Move the state to some other application and solve the problem elsewhere (like Redis, Etcd, ...)