Performance vs. Scalability

After attending Sergey Chernyshev’s (@sergeyche) Scalability vs. Performance presentation at NY Web Performance Meetup and reading Scalability: it’s the question that drives us by Robert David Graham (@ErrataRob) and Scalability vs. Performance: it isn’t a battle by Theo Schlossnagle (@postwait) I would like to share my understanding. While I agree in general with everything said, I would rather word it differently. The topic became loaded, so accents are important. Robert, for example, states that “performance” and “scalability” are orthogonal problems. Well, no, they are not. They are different, but correlated notions. Even leaving aside that performance and scalability are somewhat vague terms.

If we speak about web systems now, it looks like we can roughly separate two main components in response time (which is the main performance metric): backend (server-side) time and frontend (network and client-side time). There are subtleties and grey areas, but I’d ignore them here. The frontend time, the subject of Web Performance Optimization (WPO), doesn’t relate to scalability as far as it is not involving server processing (again ignoring subtleties).

The proportion of frontend time vs. backend time may be any at all. According to Steve Souders (@souders), the founder of WPO, 80-90% of the end-user response time is spent on the frontend. But even for major web sites the backend time for requests involving database processing (such as submitting orders or querying order status) may be more noticeable. And there are plenty of corporate web applications where the share of frontend time is rather small. Of course, the starting point of any performance troubleshooting is to find where time is spent. And there is absolutely no sense to optimize parts where time is not spent.

However, there is one important “but”. While front-end time supposed to be constant (another simplification, but again ignoring subtleties), the backend (server) time depends on load. The heavier is load, the larger may be server time. And at some point it may skyrocket making the system practically unusable. So thinking about what you need to optimize, you need to check where time is spend under maximal load. Unless you don’t care about downtime and user experience, the way to do it is load testing.

And here we get to scalability. The frontend performance indeed doesn’t matter here and is independent of scalability. But the backend performance is directly related to scalability. The relationship, of course, may be non-linear and quite sophisticated – but it does exist.

To illustrate it, let’s consider one simple (but still typical) example. The backend processing takes X ms, the time is mainly spent in CPU and we don’t have any other bottlenecks. In this case the server response time would be mainly CPU processing time – and every request would take X ms of CPU time (if we don’t have parallelism here). As soon as we take most of available CPU time, server response time would skyrocket (that situation may be modeled using queuing theory). So there is a load when the system becomes practically unusable – and the question is just when we get to this load. We, of course, may get to problems sooner if we have any scalability problem inside the system or run out of another kind of resources.

Generally speaking, you can increase scalability by either optimizing server processing (using less resources) or providing more resources. Of course, if your architecture allows using these additional resources – so mainly scalability boils down to ability to parallelize your processing (and often limited by what you can’t fully parallelize – like a centralized database).

We do have two parts of response time – frontend and backend – which behaves differently and may need different approaches and tools to optimize. But the end user experience is the sum of these two parts – where the backend time is a function of load. You can’t say much about your end-to-end performance and its backend part until you check it under load – and load testing is the safe way to do so.

Historically performance engineering concentrated on backend – where main performance and scalability issues were – and practically ignored frontend (which indeed was usually pretty straightforward then). Several sub-disciplines were formed including performance analysis, capacity planning, and load testing. Later, when sophistication of frontend skyrocketed, a whole new discipline was established by Steve Souders and quickly grew around Velocity and Web Performance meetups. Unfortunately, it practically dismissed performance engineering developments of the last 40 years (maybe even more – the Computer Measurement Group (CMG) was founded in 1975). While frontend WPO definitely has its own specific, I’d still expect to see a holistic approach to performance engineering, taking in account all aspects of performance and scalability end-to-end.

Leave a Reply