Scalability is a measure of how well a system can grow in order to handle higher demands, while still maintaining the same levels of performance.
The demand may arise as part of a sustained growth in user uptake, or it may be due to a sudden peak of traffic (for example, a food delivery application is likely to receive more requests during lunch hours).
A highly scalable system should constantly monitor its constituent components and identify components which are working above a "safe" resource limit, and scale that component either horizontally or vertically.
We can increase scalability in two ways:
- Scale Vertically or scaling Up: Increase the amount of resources (for example, CPU, RAM, storage, bandwidth) to the existing servers
- Scale Horizontally or scaling out: Adding servers to the existing cluster
Scaling vertically is simple, but there'll always be a limit as to how much CPU, RAM, bandwidth, ports, and even processes the machine can handle. For example, many kernels have a limit on the number of processes it can handle:
$ cat /proc/sys/kernel/pid_max
32768
Scaling horizontally allows you to have higher maximum limits for resources, but comes with challenges of its own. An instance of the service may hold some temporary state that must be synchronized across different instances.
However, because our API is "stateless" (in the sense that all states are in our database and not in memory), scaling horizontally poses less of an issue.