Multi-service Approach #16748
Replies: 1 comment 5 replies
-
It looks interesting to most people, however it has been proposed multiple times before - some people have even supposedly worked on such a feature but nothing ever came from it because it takes a lot of effort to overhaul the core to support it. The idea of having dungeons and raids on a separate system, maybe even have different continents on different servers, is a novel concept that would be great if implemented properly. Letting the dungeon (or instance) server crash without people questing in outland or kalimdor suffer for it, sounds great doesn't it? Obviously this is required either way or cross-realm wouldn't be possible, would it? I don't think that having multiple worldservers running just to one can take over if one crashes is something that bothers me, at least not more than splitting the load between different processes for different areas which are possibly running on separate systems. I know this kind of things is good though, don't get me wrong. I'm not stupid. I would say I'd like to contribute but I can't for two simple reasons. Go? I don't even know what that is. Docker? I'm sorry to say I'm not touching that. I just wanted to give my 2 cents on the matter, really. I do hope you get to where you want to go with it, not only for you but for the good of others. |
Beta Was this translation helpful? Give feedback.
-
Hi there! I'm the author of the ToCloud9 project, and I would like to discuss with you the multi-service/clustering/sharding/layering/distributed system approach in WoW emulation, specifically focusing on AzerothCore.
Firstly, I'd like to describe the current approach that AzerothCore and most emulators are using.

In the simplified diagram, you can see that after realm selection, all game clients are connected to a single instance of the worldserver, which handles all game logic.
This approach has its advantages and disadvantages. On the one hand, it makes development easier and faster, and it also simplifies the setup and maintenance of the infrastructure for AzerothCore users.
However, adopting a distributed architecture opens up new possibilities.
1. Horizontal Scalability.
Nowadays, we have access to incredibly powerful hardware. With high-performance CPUs and optimizations, it is possible to handle between 4,000 and 10,000 simultaneous connections, which is a remarkable achievement. However, wouldn't it be even better if we could handle 20,000 or more connections? I believe that with a distributed architecture, this is indeed possible.
Let's consider an example. Imagine you are a server owner with an average online player count of 4,000. However, you have a highly anticipated event coming up, such as the opening of the dark portal, and you estimate a maximum online count of 8,000. Several weeks before the event, you invest in new hardware capable of handling that load and spend time migrating to it. Finally, everything is set for the event. However, the event turns out to be even more popular than expected, attracting 10,000 players who want to join your realm on that day. You enabled a queue and 2,000 players are left disappointed and spend the entire evening waiting. After the event, your average online count returns to around 4,500. Now, you face a decision: should you order cheaper hardware and migrate to it, or should you continue using the expensive hardware you purchased for the event? Although the more cost-effective option would be to switch to cheaper hardware, server owners often choose to stick with the more powerful hardware, anticipating future spontaneous growth in online player count or similar events. This decision may result in higher expenses but is deemed worthwhile.
Now, let's imagine how this situation could be handled with a scalable distributed system. Suppose you are running your server in a cloud provider that offers pay-as-you-go options, billing you on an hourly basis. You have a cluster of, let's say, three average VPS instances. For the event, you add three more instances to your cluster. Using an orchestration tool like Kubernetes, you can easily scale your system components, including the worldserver. As the online count reaches its peak during the event, you add two more instances to handle the load effectively, enabling a smooth gameplay experience for all 10,000 players without the need for a queue. Once the event is over, you scale down your cluster back to three or four instances, allowing you to pay only for the few additional hours during which the extra instances were utilized. This flexible approach ensures that you pay for the actual resources consumed, providing cost efficiency.
2. Availability
With a distributed architecture, availability can be significantly improved. Let's review several scenarios to illustrate this:
3. Crossrealms
I already saw some requests in AzerothCore community for crossrealm battlegrounds. If you designed distributed system with multi realm support then your system ~80% ready fro crossrealm functionality.
4. Composability
This option is more relevant to a microservice architecture. In a microservice architecture, you would divide the logical components into different microservices. Communication between these services requires exposing APIs for most of the services.
To illustrate composability, let's consider the example of an auction house service. You have implemented an auction house service that handles in-game auctions. However, you then decide that it would be beneficial to provide players with the opportunity to interact with the auction house from a website. If you have designed the API for the auction house service with extensibility in mind, you can likely reuse it without making any changes. You would simply need to create a new service, such as a gateway service, to sit in front of the auction house service.
This gateway service would handle requests from the website and communicate with the auction house service using the existing API. It acts as an interface between the website and the auction house service, allowing players to interact with the auction house seamlessly from both the game client and the website.
By designing services with composability in mind and leveraging a microservice architecture, you can achieve reusability and flexibility in implementing new features and integrations.
Sounds interesting?
If this sounds interesting to you, then I have both good and bad news. The good news is that there are some prototypes/PoCs that attempt to implement it. The bad news is that I'm not aware of any public and complete ones.
But I would like to promote the project I'm currently working on - ToCloud9. My goal is to bring this project to a complete state. The purpose of this post is to seek assistance with its development, but I'll elaborate on that later.
What is this ToCloud9 project about?
The main goal of this project is to make TrinityCore, AzerothCore and there forks distributed and cloud native with a minimum changes in the core itself. And as result we can utilise the possibilities mentioned at the beginning.
I have chosen to adopt a microservice architecture and have built a set of microservices using the Go language. I opted for Go because it is well-suited for microservices and accelerates development in my specific case. However, since ToCloud9 follows a microservice architecture, each microservice can be written or rewritten in nearly any programming language. The key requirement is that a new microservice must comply to the predefined API protocol and be capable of containerization.
There is a small demo that demonstrates some capabilities of this project.
ToCloud9 Architecture
In this section, I would like to discuss the architectural pillars of ToCloud9. Let's begin with the approach to distributing the load between worldservers.
To achieve this, we can introduce a server that sits between the game clients (players) and worldservers. This server needs to be intelligent enough to understand the WoW protocol and have the ability to switch players from one worldserver to another. For now, let's refer to this server as the "Proxy" (although I will introduce a better name later). For each game client this "Proxy" should establish a new connection to one of the worldservers. Here is a diagram illustrating this concept.

Now, let's delve into the details of how this newly introduced server can facilitate a switch from one worldserver to another. The most straightforward approach is to divide the worldservers based on maps. For instance, we can utilize "worldserver1" for Kalimdor and Eastern Kingdoms, while assigning "worldserver2" for the remaining maps. This simplified approach is what ToCloud9 currently employs, although it would be nice to have the capability to divide worldservers based on areas/zones in the future.

We have decided to distribute maps between worldservers, but we need to determine when and how this switch will actually happen. As mentioned before, the new server needs to understand the WoW protocol. It can intercept certain packets and handle them in a special way since it knows what the player and worldserver want to send to each other. To trigger the switch from one worldserver to another, the “Proxy” server should intercept and handle the SMsgNewWorld/MsgMoveWorldPortAck opcode. This opcodes informs the client that the player should be teleported to the new map.
To explain how ToCloud9 handles this opcode, I need to introduce a new component called the servers-registry. When the “Proxy” server needs to decide which worldserver to use for a given map, it sends a request to the servers-registry server. The servers-registry is a gRPC server that is aware of all available worldservers. To be visible to the servers-registry, a worldserver needs to make a gRPC call to the servers-registry (using libsidecar) with a list of maps that it can theoretically handle. The servers-registry then performs healthcheck requests to the worldserver to keep an up-to-date list of healthy servers. With knowledge of the worldservers and the maps they can handle, the servers-registry can dynamically distribute all the maps between them.
Lets summarise this with diagrams.


By this point in the diagrams, there was only one "Proxy" server. However, having only one instance would make it unscalable and a single point of failure. Due to this limitation, ToCloud9 supports scaling of the "Proxy" server. Now, we need to address the question of how to distribute players among the "Proxy" servers.
The answer to this question can be found in the auth and servers-registry servers.
Let’s describe this in the diagram.

At this point, all the load is distributed. However, it is far from being complete. If you were to log into the game with the described architecture, you would notice that some functionality is broken. For instance, you wouldn't be able to whisper to a player who is on another worldserver. Additionally, your guild tab would only display online players who are on the same worldserver as you. To address this issue, ToCloud9 moves such functionalities from the worldserver to new microservices.
Let's take the guild functionality in as an example. There is a separate microservice dedicated to guilds. This microservice exposes the following gRPC API:
The most important client of this API is our "Proxy" server, which plays a crucial role in handling guild-related packets from the game client. When the "Proxy" server receives a guild-related packet, it reads the packet's content and generates a gRPC call to the guilds microservice. Upon receiving a response from the guilds microservice, the "Proxy" server generates a WoW packet based on this response and sends it back to the game client. Notably, there is no interaction with the worldserver involved in this functionality.
Since we have reached this point, it is necessary to clarify the naming for this "Proxy" server. As mentioned earlier, I believe that a more suitable name for it is "API Gateway". This name better reflects its role and purpose.
Now lets reflect this in diagrams.

In the diagram above, you will notice the inclusion of a new component called NATS Message Bus/Message Broker/PubSub. This component serves as a message bus and allows for publish-subscribe functionality. Certain microservices have the capability to produce events, such as GuildEventNewMessage, LBEventCharacterLoggedIn, MailEventIncomingMail, and more. Any microservice can subscribe to specific events using NATS and handle them accordingly.
Congratulations! Now you know the pillars on which ToCloud9 stands.
Now, let me list all the implemented microservices and outline the remaining work that needs to be done.
And the current (simplified) architecture looks like this:

Things to be done (ordered by my priority):
Epilog
So what now? Should AzerothCore replace the current architecture?
In my opinion, no. The current architecture works well for approximately 90% of AzerothCore users, especially those who are using it to play with a small group of friends. However, that remaining ~10% of users can be the most important for your project.
Ideally, AzerothCore should provide an option to run itself in a cluster mode. This would allow those users with specific requirements, such as high scalability, fault tolerance, to utilize AzerothCore in a distributed architecture.
How can you help the ToCloud9 project?
So what your thoughts? Does this new approach looks interesting to you?
Beta Was this translation helpful? Give feedback.
All reactions