Archive

Archive for April, 2012

Load Testing: Its Present and Future

April 26th, 2012 3 comments
Share

Recent trends of agile development, DevOps, Web and Social Media sites somewhat question importance of load testing. Some (not many) openly saying that they don’t need load testing, some still paying lip service to it – but just never get to it. In more traditional corporate world we still see performance testing groups and important systems usually get load tested before deployment.

Let’s first define load testing as far as terminology is rather vague here. I use it here as anything that requires applying multi-user synthetic load – in contrast with single-user performance (which is a subset of performance engineering and may include, for example, profiling or Web Performance Optimization as it is defined now). And I use it here as an umbrella term including all other variations of multi-user testing, such as performance, concurrency, stress, endurance, longevity, scalability, etc. – but you may replace it with any other term if you prefer.

Yes, it looks like some Web and Social Media sites managed to survive without load testing. However, it looks like many such companies match the following profile:
-Business is built around a single Web site, so everybody in the company follows what is going on in production.
-Overall architecture is still clear and relatively simple. Changes (however frequent) are rather minor and evolutional.
-There is decent instrumentation providing performance information.
-There is a possibility to remove changes relatively easy.
-Site downtime/a period of slow performance (until the problem would be noticed and fixed) is not extremely painful or dangerous to the business.

Load testing is a way to mitigate load- and performance-related risks. There are other approaches and techniques that also alleviate some performance risks:
-Good single-user performance engineering practices (single-user requests performance are constantly tracked).
-Good instrumentation/Application Performance Management providing insights in what is going on inside the system.
-[Auto] scalable architecture.
-Continuous integration allowing quickly deploy and remove changes.

Still all of these don’t completely replace load testing, but rather complement it. They definitely decrease performance risk comparing with situation when nothing was done about performance at all until the last moment before rolling out the system in production without any instrumentation at all, but it still leaves risks of crashing and performance degradation under multi-user load. And if the cost of it is high, you should do load testing (what exactly and how is another large topic – there is much more here than the stereotypical waterfall-like last-moment record-and-replay approach).

There is always a risk of crashing or performance issues under heavy load – and the only way to mitigate it is actually test it. Even stellar performance in production and highly scalable architecture don’t guarantee that it won’t crash with a slightly higher load. Truly speaking, even load testing doesn’t completely guarantee it (real-life workload may be different from what you have tested), but it drastically decreases the risk.

Another important value of load testing is making sure that changes don’t degrade multi-user performance. Unfortunately, better single-user performance doesn’t guarantee better multi-user performance. In many cases it improves multi-user performance too, but definitely not always. And the more complex system, the more probable exotic multi-user performance issues no one even thought of. And a way to ensure that you don’t have such issues is load testing.

When you do performance optimization, you need a reproducible way to evaluate the impact of changes on multi-user performance. The impact on multi-user performance probably won’t be proportional to what you see with single-user performance (even if it still would be somewhat correlated). Without multi-user testing the actual effect is difficult to quantify. The same with the issues happening only in specific cases that are difficult to troubleshoot and verify in production – using load testing can significantly simplify the process.

Summarizing, I don’t see that the need in load testing is going away. Even in case of Web and Social Media sites we would probably see load testing coming back as soon as systems become more complex and performance issues start to hurt business. Maybe it would be less need for “performance testers” as it was at the heyday due to better instrumenting, APM tools, continuous integration, etc. – but I’d expect more need for performance experts that would be able to see the whole picture using all available tools and techniques (although I don’t see it yet).

Performance Dimension of Information Technology

April 16th, 2012 1 comment
Share

There are no standards on titles and skill sets related to performance dimension of IT. I decided to put together how I understand them (most terms are vague, so it is quite possible that other people understand them differently). Of course, it is a simplification – but the topic is probably too heavy influenced by organization history and politics in every particular organization to be clear cut anyway.

I still think that we can break the whole area into three major categories: design (and development), testing, and production (maybe somewhat matching ITIL terms of Service Design, Service Transition, and Service Operation). The term Performance Engineering may be related to the whole area (or maybe related to the design category – in this case sometimes referred as Software Performance Engineering, SPE).

Performance Design. Talking about the design category (I used the ‘Performance Design’ term to group all performance-related activities during design and development , although it isn’t used this way – probably reflecting that the whole area is not quite existing as a separate discipline), we have specific areas of performance engineering knowledge for each specific technology. Such as Java performance, .Net performance, etc. One relatively new, but large and popular area is Web Performance Optimization, covering end-user Web performance. And, of course, we have Software Performance Engineering (SPE) trying to establish generic approaches – although SPE progress wasn’t too impressive since Dr. Connie Smith published ‘Performance Engineering of Software Systems’ in 1990.

It is definitely supposed to be an important part of the skill set of software architects (on a higher level, SPE, etc.) and software developers (maybe on a lower level, how efficiently design specific component using the chosen technology – but good understanding of high-level performance engineering won’t hurt either).

And while many architects and developers have some understanding of performance, often the main stress is on functionality and deadlines, so performance is left to the very end – where it sometimes may be indeed tuned in (usually when technologies are mature and the team is quite experienced), and sometimes require major changes (and late changes are very expensive).

It looks like the idea to have an explicit person responsible for performance from the beginning (starting from requirements) and working with other architects and developers to build it in makes sense. The title may be ‘performance architect’ or ‘performance champion’. Although such people are rare – rather we could see a proactive person from performance engineering or performance testing groups trying to ask performance questions early.

Performance Testing. Including, of course, all other variations and names, such as load, stress, endurance, etc. testing. ITIL matching term would probably Service Validation and Testing. All ways to apply synthetic load to the system and analyze system’s behavior. In the narrow sense, ‘performance tester’ is responsible for creating and applying such load (test scripting and execution). In a wider sense, it also includes workload characterization (workload modeling), performance analysis and performance troubleshooting – and often such person is referred as ‘performance engineer’. In some cases they are different people: performance tester is responsible for applying the load and performance engineer (maybe performance analyst in this case) is responsible for system analysis and optimization.

I definitely put performance testing in a separate category due to specific set of skills required: workload generation. And, perhaps, techniques to find and fix issues in the system applying an appropriate workload. But definitely not because “testing should go after development before production” as it use to be in the waterfall approach – testing should start as early as possible mostly overlapping with development and may continue in production. Monitoring the system using synthetic workload, for example, I’d rather also put in this testing category – it is actually testing the production system in parallel to production workload.

Performance Management, perhaps, may be a good name for the collection of performance-related activities and skills in production (and around).

It is interesting that ITIL places Capacity Management and Service Level Management processes into Service Design. I see a point here – you definitely need to allocate capacity before deploying the system, and Service Levels should probably come directly from the performance requirements. Still real people working in these areas are usually part of operations. Capacity Planners are responsible for allocating resources, although fewer and fewer people have such title and these responsibilities get spread between other groups (which, unfortunately, often don’t have appropriate skills).

Service Level Management would probably handled by Performance Monitoring (Analysis). ITIL matching term would probably Service Measurement. Title ‘Performance Analysts’ used often in the past – but not very popular anymore. Probably title ‘Performance Engineer’ is more popular now. And, of course, it may be specialized, like Database Monitoring, System Monitoring, Application Server Monitoring. These may be done by respective administrators (DBA, system administrator, etc.).

Application Monitoring – relatively new staff. Usually referred as Application Performance Monitoring. The idea is to measure application-specific metrics (including business-related metric, end-user metrics, etc.) in addition to those system-level metrics that used to be measured earlier. Importance of application monitoring is definitely growing. From one side, system-level metrics becomes less relevant in today’s infrastructure with virtualization, multi-tenancy, cloud, etc. From another side, the system becomes so complicated that trying to figure out what is going on using low-level metrics becomes nightmare. Form the third side, full monitoring from the business point of view becomes a business requirement – and it is where IT can provide unique business advantage.

Probably Application Performance Management (APM) would the right category encompassing most production-related categories such as Performance Monitoring, Capacity Management, Diagnostics (troubleshooting) and Tuning (and Optimization – although this may somewhat get into re-design category). We probably not there yet and Application Performance Management is rather a vague vision than reality. Gartner, for example, stresses that APM is Application Performance Monitoring, not Management. And I am not sure what would be a title of the person doing this. Management is a favorite word for an area of expertise (as in Performance Management or Capacity Management), but Manager (at least in the US) still means a person who manages other people. So the title, I guess, would be the same ubiquitous ‘performance engineer’.

Performance Troubleshooting or Diagnostics is definitely important part of Performance Management and is an application of performance engineering to existing performance issues. While it is probably the most typical performance-related activity at many corporations, very few have anything formal around it and usually all other performance-related groups get involved. And we need performance engineering kind of skills to investigate and fix performance problems in production.

It looks like that in the new generation of Web companies monitoring and capacity planning often included into ‘Site Reliability’, adding, I guess, some confusion to the already existing mess of terms and notions.

P.S. By the way, the only conference covering almost all topics mentioned above is CMG. Call for papers and workshops is opened now.