Load Testing: What Tool to Choose?
Classifying and evaluating load testing tools is not easy as they include different sets of functionality often crossing borders of whatever criteria are used. In most cases, any classification is either an oversimplification (which in some cases still may be useful) or a marketing trick to highlight advantages of specific tools. There are many criteria allowing to differentiate load testing tools and it is probably better to evaluate tools on each criterion separately.
First, there are three main approaches to workload generation and every tool may be evaluated on which of them it supports and how exactly.
Protocol-level recording and the list of supported protocols. Does the tool support protocol-level recording and, if it does, what protocols it supports. With quick Internet growth and popularity of browser-based clients, most products support HTTP only or a few Web-related protocols. According to my knowledge, only HP LoadRunner and Microfocus SilkPerformer try to keep up with support of all popular protocols. So if you need recording of a special protocol, you probably end up into looking at these two tools (unless you find a special niche tool supporting your specific protocol). That somewhat explains the popularity of LoadRunner at large corporations where you probably have almost all possible protocols used. The level of support of specific protocols differs significantly too. Some HTTP-based protocols are extremely difficult to correlate if there is no built-in support, so you may look for that kind of specific support. For example, Oracle Application Testing Suite may have better support of Oracle technologies.
UI-level recording. The option was available for a long time, but it is much more viable now. For example, there was a possibility to use Mercury/HP WinRunner or QuickTest Professional (QTP) scripts in load tests, but you needed a separate machine for each virtual user (or at least a separate terminal session). That limited the level of load you may achieve drastically. Other known options were, for example, Citrix and RDP (Remote Desktop Protocol) protocols in LoadRunner – which always were the last resort when nothing else was working, but were notoriously tricky to playback. New UI-level tools for browsers, such as Selenium, extended possibilities of the UI-level approach allowing to run multiple browser per machine (so scalability is limited by resources available to run browsers). Moreover, we got UI-less browsers, such as HtmlUnit, which require significantly less resources than real browsers. There are multiple tools supporting this approach now – such as PushToTest directly harnessing Selenium and HtmlUnit for load testing or LoadRunner TruClient protocol and SOASTA CloudTest using more proprietary solutions to achieve low-overhead playback. Still questions of supported technologies, scalability, and timing accuracy remain largely undocumented, so the approach requires evaluation in every specific non-trivial case.
Programming. There are cases when you can’t (or can, but it is more difficult) use recording at all. In such cases using API calls from the script may be an option. Other variations of this approach are web services scripting and using of unit testing scripts for load testing. And, of course, you may need to add some logic to your recorded script. You program the script using whatever way you have and use the tool to execute scripts, coordinate their executions, report and analyze results. To do this, the tool should have ability to add code to (or invoke code from) your script. And, of course, if tool’s language is different from the language of your API, you would need to figure out a way to plumb them. Tools, using standard languages such as C (e.g. LoadRunner) or Java (e.g. Oracle Application Testing Suite) may have an advantage here. However you should know all details of the communication between client and server that is often very challenging.
Other important criteria are related to the environment:
Deployment Model. There were a lot of discussions about different deployment models: lab vs. cloud vs. service. There are some advantages and disadvantage of each model. Depending on your goals and systems to test you may prefer one deployment model over another. But I still believe that for comprehensive performance testing you really need both lab testing (with reproducible results for performance optimization) and realistic outside testing from around the globe (to check real-life issues that you can’t simulate in the lab). Doing both would be expensive and makes sense when you really care about performance and have a global system – but it not rare and if you are not there yet, you can get there eventually. If there are such chances, it would be better to have a tool which supports different deployment models.
If it is lab or cloud, an important sub-question would be what kind of software / hardware / cloud the tool requires. Many tools use low-level system functionality, so is may be unpleasant surprises when the platform of your choice or your corporate browser standard is not supported.
Scaling. When you have a few users to simulate, it usually is not a problem. The more users you need to simulate, the more important it becomes. The tools differ drastically on how many resources they need per simulated user and how well they may handle large volumes of information. It may differ significantly even for specific tool depending on protocol used and specifics of your script. As soon as you get to thousands of users, it may become a major problem. For a very large number of users some automation, like automatic creation of a specified number of load generators across several clouds in SOASTA CloudTest, may be very handy.
Two other important sets of functionality are monitoring of the environment and result analysis. While theoretically it is possible to do it using other tools, it significantly degrades productivity and may require building some plumbing infrastructure. So while these two areas may look optional, integrated and powerful monitoring and result analysis are very important. And the more complex system and tests, the more important they are.
Of course, non-technical criteria are important too:
Cost. There are commercial tools (and license costs differ drastically) and free tools. And there are some choices in between: for example SOASTA has the CouldTest Light edition free up to 100 users. There are many free tools (some, as JMeter, are mature enough and well-known) and many inexpensive tools, but most of them are very limited in functionality.
Skills. Considering a large number of tools and a relatively small number of people working in the area, there is a kind of labor market only for the most popular tools. Even for the second-tier tools there are few people around and few positions available. So if you don’t choose the market leaders, you can’t count that you find people with this tool experience. Of course, an experienced performance engineer will learn any tool – but it may take some time until productivity will get to the expected level.
Support. Recording and load generation has a lot of sophistication in the background and issues may happen in every area. Availability of good support may significantly improve productivity.
This is, of course, not a comprehensive list of criteria – rather a few starting points. Unfortunately, in most cases you can’t just rank tools on the better – worse scale. It may be that a simple tool will work quite well in your case. If your business is built around a single web site, it doesn’t use sophisticated technologies, and load is not extremely high – almost every tool will work for you. The further you are from this state, the more challenging it would be to pick up the right tool. And it even may be that you need several tools.
And while you may evaluate tools with above mentioned criteria, it is not guaranteed that a specific tool will work with your specific product (unless it uses a well-known and straightforward technology). That actually means that if you have a few system to test, you need to evaluate the tools you consider using your systems and see if the tools can handle them. If you have many, choosing a tool supporting multiple load generation options is probably a good idea (and, of course, check it with at least the most important systems).

Alex – Thanks for the blog, a good summary. Under the “Programming” category, we successfully used the “grinder” application (http://grinder.sourceforge.net/) to emulate a Java thick-client application. It uses Jython (Java for Python) and interacts with the client Jar files to provide a top-level client emulation. Most discussions of performance testing only discuss the thin-client web solutions which leaves those with thick client applications in the cold.
Alex,
Thank you for at least mentioning monitoring and results analysis. There is a third component that is all too often overlooked or simply just not there. I think its part of the reason that Loadrunner with all its faults, excessive costs, and learning cliff is still being used. That is reporting and producing meaningful reports and charts that can be given to management that dont require and engineer to explain them. SOASTA, imho, is one of the few aside from HP that has a good monitoring, analysis and reporting capability. Jmeter, Selenium, and several other free or open source solutions just don’t have even rudimentary report production ability. They might produce something I’d hand another engineer but not a Senior manager or VP.
Great summary Alex!
I’ve been a LoadRunner user / evangelist for 12-13 years and this is an excellent article which compares it nicely with the competition. If you only want to test web/HTTP, you’re spoilt for choice. As soon as you want the more “exotic” protocols you fall back on one or two alternatives.
Greg Moore’s comments about reporting and analysis are spot on as well. People assume that because you can generate load on an application, you’re testing it. Unless you can produce well-informed, timely reports about your tests it often isn’t worth testing.
Thanks for sharing this.
All the best,
Richard
Hi,
This is a really good write up of the different levels of load and performance testing.
We run a cloud-based load testing service that is specifically targeted towards users that want to get up and running quickly, which I guess would fall into the web/HTTP category, although we do get a lot of requests for other protocols.
Thanks,
Martin
Alex – good, well-thought through article.
What are your views on internally generated load vs cloud load generation?
There are concerns about security from the cloud but I have also seen project delays due to acquiring and setting up internal load generator environments for medium to large loads.
Graham
Hi Alex
Well thought and explained list of parameters for choosing a performance testing tool.
Adding to the few alternatives listed by you..SandStorm(http://sandstorm.impetus.com/) is an enterprise level performance testing tool which support multiple protocols, written in Java, also available on cloud.
Prakher
Hi Alex,
Thanks for this great article !
I would also add Agileload ( http://www.agileload.com ) It is a new tool on the web, but it is on the market for some years now. It has a great real-time monitoring, anomalies detection and reporting features. You can customize and reuse different reports according to your audience.
Well we can monitor the main ERP/CRM application… there is also Quick native interface support (C/C++) for API calls … It can be setup on the cloud with no extra costs.
The tool is free for download on our website with free scripting and free testing up to 10 virtual users. No trial. We propose a rental license on a VU and timeframe basis.
My comments may seems a little “commercial”, but maybe it worth a try…
Graham,
See above some thoughts about cloud vs. local under the ‘Deployment Model’ subtitle. Both security and hardware availability are important factors – although rather organizational.
If you have ongoing performance initiatives, you may use both approaches depending on what is the goal of your test. For example, if you want to see the effect of performance improvement (performance optimization), you may be better off using an isolated lab environment. If you want to do load testing of the whole production environment end-to-end under full load and are not very concerned by small variations, testing from Cloud may be more appropriate.
Actually we have more options to consider: we may deploy the system under test in Cloud even if the production system is deployed locally or we may have Cloud software to test (and may still want to have a lab for its performance optimization – see my Performance Testing and Optimization for the Cloud post).
Prakher & Elsane,
Actually I didn’t mean to list alternatives in this post. I mentioned a few tools just as example of some interesting approaches / features. Even on my site where I keep a collections of performance-related links I listed only a few I see more often / hear some good feedback (and a few links with the lists of load testing tools – although all are far from comprehensive). The number of load testing tools available is very high (only the list of open source tools has 52 tools in it). Many are just slightly decorated test harnesses. And when you read about them, they sound exactly the same (although, in reality, they are not). Most stress “easy to use” and “low cost” alluring customers looking to cut costs (both on tools and tested qualification). It doesn’t look like most seriously position themselves against incumbent products. Well, it quite could be that some products are hidden gems – but I still haven’t found a way to figure it out.
There are very few that do something special – for example, SOASTA or BlazeMeter come to my mind (I don’t want to say that their products better than others or endorse them – but at least I understand what they bet on and see some points in their strategy). Again, maybe others do something very interesting too, but I am missing it so far (I am not talking about heavyweights here – they do some interesting stuff too). So I often struggle when asked to add a load testing tool to my list – why this one, not several dozens of others? Not having time to keep the full list, I usually add tools that have something special or if I hear about this tool many times from different sources. So I finally added SandStorm (I have seen it mentioned here and there), but it is the first time I hear about AgileLoad.
Its an awesome article to get to know about Performance testing tools.IBM® Rational® Performance Tester is a performance testing tool that identifies the presence and cause of system performance bottlenecks.
Alex, thank you for post. So, do you think that it is impossible for freeware tools to support all protocols in future? Why?
Another question is that if we use LoadRunner, we are not able to use cloud testing solutions, such as Blazemeter(http://blazemeter.com), Loadimpact, Amazon appliances and so on. In this case we will limited only with lab-testing, and it is not good, I think.
Dzmitry,
I would be surprised if freeware tools would support other protocols now, when usage other, non-Internet related protocols, significantly decreased. Open-source/freeware load testing tools were not so successful as open-source/freeware software in other areas. My personal explanation is that the people who seriously work with load testing tools usually are not developers and developers usually don’t seriously use load testing tools.
Nothing prevents you from using LoadRunner in the cloud and nothing limits you to lab environments. Install agent whenever you want – and you test the system from there. Moreover, HP is working to provide cloud solutions – see, for example, "HP Powers Application Performance Testing in the Cloud". I am not saying that they completely succeeded there and they are probably somewhat behind in cloud-related functionality, but you definitely may use LoadRunner in the cloud.
Well presented article. An additional alternative solution is AppLoader, a versatile, protocol independent load testing tool.
What do you think about Loadcomplete? Even though it doesnt support many protocols like LRunner does, do you think its a good alternative to Lrunner?