Archive

Posts Tagged ‘Tools’

Two Main Challenges of Performance Modeling and System Sizing

September 10th, 2012 No comments
Share

Reading about performance modeling / simulation and system sizing, you often see two completely opposite views of the subject. Either authors describe in detail how you can model performance using some math and you may feel that as soon as you comprehend that math, you won’t have any problem with modeling. Or authors say that it is a black magic and you’d better stay away from it or do it in a minimal way with simple trending (while you probably won’t see that view in serious books, it is often can be seen in Internet discussions).

The truth, as usual, is in the middle. Modeling is a very helpful and works well if you use it properly and understand its limitations. And there are two main challenges here that rarely get highlighted – while everybody who wants to approach the topic should understand them clearly.

The first challenge is that modeling works well for known resource limitations. You should know these limitations in advance (and how your system uses that limited resource – which is also a challenge, but more technical one). For example, if your system is processor-bound and you know how much cpu it takes per transaction, you may build a rather simple and pretty accurate model (using queuing theory or even something pretty simple – for example, if you stay away from heavy cpu utilization, linear model may work well with multi-processor systems).

But that model would never tell you when you run out of another resource and run into another kind of bottleneck until you build it into the model. And beyond a few common resources (processor, memory, disk, network) and explicitly introduced throttling, you usually don’t know about bottlenecks until you run into them. This is the primary reason that results of your model (which may be perfect from the mathematical point of view) are not reliable if you model significantly higher load than you tested / validated – as far as there is a high probability that you run into another bottleneck you are not taking in consideration now. However, the model would provide the best possible case (which turns true when you fix all other bottlenecks you didn’t take in consideration at the moment of modeling), which is important information by itself. A model would also be very useful to see if the system behaves up to expectations – or there are internal issues degrading performance and preventing scalability (that may be not so trivial to catch in complex systems).

Another challenge is a lack of performance-related metrics of hardware to use in modeling. You can find detailed hardware specifications, but they won’t tell you how fast your systems would work on this hardware. As far as I understand, the only relatively objective approach (without testing the real system on the real hardware – which is, of course, the best) is to use existing benchmark results to compare performance (keeping in mind that they represent results of this specific benchmark, not your systems). Most serious commercial modeling tools come with a library of hardware configurations and their performance metrics, allowing what-if performance analysis. It looks like keeping such libraries is a pretty time-consuming task and their quality may differ. Such a library is usually a major advantage of commercial modeling tools in comparison with free or inexpensive modeling tools (which may be quite good from the mathematical point of view, but you need to provide all numbers yourself).

IDC made an interesting move here introducing QPI (Qualified Performance Indicator) as a part of IDC’s Server Decision Suite Metrics (free 30-days trial available). A kind of independent performance library that may be used for proper performance modeling / sizing (and, as far as I understand, going well beyond performance, integrating this information with other IT-related metrics such as price, power, and size – it should be a very interesting optimization task to find the best hardware configuration based on all these metrics).

Load Testing: What Tool to Choose?

May 10th, 2012 13 comments
Share

Classifying and evaluating load testing tools is not easy as they include different sets of functionality often crossing borders of whatever criteria are used. In most cases, any classification is either an oversimplification (which in some cases still may be useful) or a marketing trick to highlight advantages of specific tools. There are many criteria allowing to differentiate load testing tools and it is probably better to evaluate tools on each criterion separately.

First, there are three main approaches to workload generation and every tool may be evaluated on which of them it supports and how exactly.

Protocol-level recording and the list of supported protocols. Does the tool support protocol-level recording and, if it does, what protocols it supports. With quick Internet growth and popularity of browser-based clients, most products support HTTP only or a few Web-related protocols. According to my knowledge, only HP LoadRunner and Microfocus SilkPerformer try to keep up with support of all popular protocols. So if you need recording of a special protocol, you probably end up into looking at these two tools (unless you find a special niche tool supporting your specific protocol). That somewhat explains the popularity of LoadRunner at large corporations where you probably have almost all possible protocols used. The level of support of specific protocols differs significantly too. Some HTTP-based protocols are extremely difficult to correlate if there is no built-in support, so you may look for that kind of specific support. For example, Oracle Application Testing Suite may have better support of Oracle technologies.

UI-level recording. The option was available for a long time, but it is much more viable now. For example, there was a possibility to use Mercury/HP WinRunner or QuickTest Professional (QTP) scripts in load tests, but you needed a separate machine for each virtual user (or at least a separate terminal session). That limited the level of load you may achieve drastically. Other known options were, for example, Citrix and RDP (Remote Desktop Protocol) protocols in LoadRunner – which always were the last resort when nothing else was working, but were notoriously tricky to playback. New UI-level tools for browsers, such as Selenium, extended possibilities of the UI-level approach allowing to run multiple browser per machine (so scalability is limited by resources available to run browsers). Moreover, we got UI-less browsers, such as HtmlUnit, which require significantly less resources than real browsers. There are multiple tools supporting this approach now – such as PushToTest directly harnessing Selenium and HtmlUnit for load testing or LoadRunner TruClient protocol and SOASTA CloudTest using more proprietary solutions to achieve low-overhead playback. Still questions of supported technologies, scalability, and timing accuracy remain largely undocumented, so the approach requires evaluation in every specific non-trivial case.

Programming. There are cases when you can’t (or can, but it is more difficult) use recording at all. In such cases using API calls from the script may be an option. Other variations of this approach are web services scripting and using of unit testing scripts for load testing. And, of course, you may need to add some logic to your recorded script. You program the script using whatever way you have and use the tool to execute scripts, coordinate their executions, report and analyze results. To do this, the tool should have ability to add code to (or invoke code from) your script. And, of course, if tool’s language is different from the language of your API, you would need to figure out a way to plumb them. Tools, using standard languages such as C (e.g. LoadRunner) or Java (e.g. Oracle Application Testing Suite) may have an advantage here. However you should know all details of the communication between client and server that is often very challenging.

Other important criteria are related to the environment:

Deployment Model. There were a lot of discussions about different deployment models: lab vs. cloud vs. service. There are some advantages and disadvantage of each model. Depending on your goals and systems to test you may prefer one deployment model over another. But I still believe that for comprehensive performance testing you really need both lab testing (with reproducible results for performance optimization) and realistic outside testing from around the globe (to check real-life issues that you can’t simulate in the lab). Doing both would be expensive and makes sense when you really care about performance and have a global system – but it not rare and if you are not there yet, you can get there eventually. If there are such chances, it would be better to have a tool which supports different deployment models.

If it is lab or cloud, an important sub-question would be what kind of software / hardware / cloud the tool requires. Many tools use low-level system functionality, so is may be unpleasant surprises when the platform of your choice or your corporate browser standard is not supported.

Scaling. When you have a few users to simulate, it usually is not a problem. The more users you need to simulate, the more important it becomes. The tools differ drastically on how many resources they need per simulated user and how well they may handle large volumes of information. It may differ significantly even for specific tool depending on protocol used and specifics of your script. As soon as you get to thousands of users, it may become a major problem. For a very large number of users some automation, like automatic creation of a specified number of load generators across several clouds in SOASTA CloudTest, may be very handy.

Two other important sets of functionality are monitoring of the environment and result analysis. While theoretically it is possible to do it using other tools, it significantly degrades productivity and may require building some plumbing infrastructure. So while these two areas may look optional, integrated and powerful monitoring and result analysis are very important. And the more complex system and tests, the more important they are.

Of course, non-technical criteria are important too:

Cost. There are commercial tools (and license costs differ drastically) and free tools. And there are some choices in between: for example SOASTA has the CouldTest Light edition free up to 100 users. There are many free tools (some, as JMeter, are mature enough and well-known) and many inexpensive tools, but most of them are very limited in functionality.

Skills. Considering a large number of tools and a relatively small number of people working in the area, there is a kind of labor market only for the most popular tools. Even for the second-tier tools there are few people around and few positions available. So if you don’t choose the market leaders, you can’t count that you find people with this tool experience. Of course, an experienced performance engineer will learn any tool – but it may take some time until productivity will get to the expected level.

Support. Recording and load generation has a lot of sophistication in the background and issues may happen in every area. Availability of good support may significantly improve productivity.

This is, of course, not a comprehensive list of criteria – rather a few starting points. Unfortunately, in most cases you can’t just rank tools on the better – worse scale. It may be that a simple tool will work quite well in your case. If your business is built around a single web site, it doesn’t use sophisticated technologies, and load is not extremely high – almost every tool will work for you. The further you are from this state, the more challenging it would be to pick up the right tool. And it even may be that you need several tools.

And while you may evaluate tools with above mentioned criteria, it is not guaranteed that a specific tool will work with your specific product (unless it uses a well-known and straightforward technology). That actually means that if you have a few system to test, you need to evaluate the tools you consider using your systems and see if the tools can handle them. If you have many, choosing a tool supporting multiple load generation options is probably a good idea (and, of course, check it with at least the most important systems).

Multiple Dimensions of Response Time

February 27th, 2012 No comments
Share

It looks like everything related to performance has multiple dimensions. Reading recently excellent posts A non-geeky guide to understanding performance measurement terms by Joshua Bixby and Building a High Performance Website by Phil Stanhope, I realized how many dimensions even a relatively simple term “response time” has. And, moreover, it looks like we don’t have a reliable way to measure the response time that would matter to end user (I guess something between “time to display” and “time to interactivity” depending on the site design, if follow the posts terminology). Both authors look at this rather from the front end / Web Performance Optimization (WPO) point of view.

Spending most of my time in performance testing, I’d guess that “response time” comes from load testing / active monitoring tools that are the main source of performance information (the “waterfall” approach of the WPO community quickly becomes popular – but I am not sure how many monitoring services use it). And in this case, “response time” is what the tool reports. What “response time” means in such case is heavily depends on the tool and its settings – and in many cases, I guess, it won’t be any of the metrics provided in the aforementioned posts (which, I guess, are standard in the WPO community – but they may be not easy to measure by load testing and enterprise monitoring tools). For protocol-based tools it would be probably the time of receiving all requests without any client-side activities (with many additional details of browser emulation- like caching, threading, keep-alive, compressing, etc.). For GUI-based tools it probably depends on what underlying mechanism the tool uses and how the script is designed. Quite often if you don’t set any specific checks it may report a success without full downloading and rendering (and when somebody say that a modern sophisticated site will load for 0.169 sec over the Internet it would be my first guess). Although, if scripted properly, it perhaps may measure the performance metric that matters (when the page would “be almost fully interactive”) by checking that the parts that matter are downloaded and rendered (that probably can’t be done without manual scripting / analysis).

That brings an interesting question about Application Performance Management (APM): what End-User Experience Monitoring (EUM) a.k.a. Real-User Monitoring (RUM) measures? EUM/RUM is considered as an integral part of APM (and definitely should be), but may measure pretty different things depending on the approach to measure it. And as I mentioned above, it probably won’t be the actual end-user experience – but only its approximation by another metric (different for different tools).

Only thing that often saves us from all this complexness is , as often happens in performance, that in many cases it doesn’t matter. All of the metrics are just close enough from the practical point of view. In old good times of plain html the main part was getting response from the server, the client-side part was fixed and usually small. So it wasn’t said much about different kinds of response times in the past. The situation is changing now: the front-end time becomes significant (see the Performance Golden Rule by Steve Souders, keeping in mind that it is based on front pages mainly) and now it looks like we can’t ignore the differences between response times anymore.

A Few Thoughts about APM (Application Performance Management) and Its Future

February 27th, 2012 No comments
Categories: APM, Performance, Tools Tags: , ,

Is the Current Model of Load/Performance Testing Broken?

December 26th, 2011 No comments

Application Performance Management

December 5th, 2011 No comments
Share

When I created my site as a collection of performance-related links and documents in 2004, I grouped links somewhat arbitrary, just to avoid “analysis paralysis”, hoping to get back soon and polish as needed. It is interesting that I haven’t changed much in grouping for these seven years (definitely many things changed, many changes are long time due, but with main grouping of information I wasn’t able to improve much). Whatever links I added, they mainly fit one (or few) existing category. And just now I realized that we have a new information category – Application Performance Management – which doesn’t fit in any existing category. I had a category for APM tools from the beginning – they were around for a while – but not for generic APM information (something beyond talking about just tool features). And finally I put together a list of great information sources into a new group, Application Performance Management:

Application Performance Engineering Hub

Application Performance, Scalability, and Architecture blog from Dynatrace

The Performance Management section of The Virtualization Practice

APM Digest

Correlsense blog

App Signal blog from AppDynamics

Catchpoint’s Blog

Seriti Consulting Blog the Web Operations and Management Specialists, by Stephen Thair

Many of them existed for a while, but it looks like the quantity finally got into a new quality and we see a new discipline emerging (instead of a marketing term to promote a special kind of tools). It is definitely related that with new technologies, such as virtualization and cloud computing, traditional resource monitoring is not enough anymore and there is a need monitor on application and service levels. Some mentioned above blogs are from tool vendors, but they provide great content far beyond discussing the tools.

A new generation of APM products?

October 18th, 2011 1 comment
Share

Bernd Harzog’s post Why is Application Performance Management so Screwed Up? started a lot of discussions on the Internet. The post is a very good list of existing issues you may face when you try to use APM tools. I’d add one more – overheads. At least for the first generation, the claim that you may use APM in production worked only if you did very selective monitoring.

My view of APM is that first generation of APM tools so well described by Bernd was very immature. Not that something was explicitly wrong with the APM in general – really wrong was the drastic contrast between what the tools actually could do and marketing promises of tool vendors. The vendors talked more about the APM vision and how the APM tools are supposed to work – but not about the exact things these tools are able to do. Which you figured out in the best case after you spent a few days evaluating the product.

If check Garter Magic Quadrant for Application Performance Monitoring or my list of tools, it is clear that the market is very crowded and not well defined. There is no good criteria you can compare tools and different tools may actually do pretty different things, although it may be difficult to understand reading about them on vendor’s sites.

However I’d say that now we are getting the second generation of APM tools which are much closer to the APM promise for some technologies. I don’t want to list names here and separate “first” and “second” generations. I’d guess that some “first” generation tools might advance to the “second” generation if kept progress – but, as I said, it is difficult to say without actual evaluation of the tools. So I am hearing a lot of stories that people were able to successfully implement APM for system X using tool Y without many problems.

Still you doesn’t have a product which will do APM across all platforms and system if you have a full zoo of different technologies some of which are older than most of your IT employees (as many large corporations do). And don’t believe to anybody who tells you that they can do that. Still it looks like you can do it now for more systems with fewer problems – and start reaping the benefits of APM. Actually I don’t see any other alternative to APM in the long run – although it is a topic for a separate post. But be aware of all points mentioned in Bernd’s post – and check if the product you are going to use doing what you need in the way you want.

P.S. Just before posting noticed another Bernd Harzog post where he shares his view of next generation APM products.

Oracle Application Testing Suite 9.3

October 10th, 2011 No comments
Share

Oracle Application Testing Suite 9.3 was released some time ago. It is available for download (subject to OTN License Agreement). Some new features and updates in this release are described in the press release.

A New Move in the Load Testing Tool Market

July 24th, 2011 12 comments
Share

SOASTA launched CloudTest Lite – a free edition of their performance testing solution. Basically, they give it free for up to 100 users. A serious move for sure. It should heat up the load testing tool market. It may work indeed – I guess they don’t have many paid customers in that range anyway, looks like CloudTest’s sweet spot is when you need a very large number of users. I am very interested to see how it will turn out.

Several rosy reviews were posted, for example, CloudTest Lite – A Game Changer in the Performance Tool Market by Scott Barber and SOASTA CloudTest Lite Hands-On by Bernard Golden.

As I already mentioned, it indeed is pretty interesting. However, I’d say that we need add some skepticism to be more realistic.

First, it is not the first and absolutely unique move in load testing tools. I recall a few somewhat similar moves before which then quietly disappeared. Well, I don’t remember what limitations were (maybe a little bit more restrictive). And the companies were not the leaders of the market. Moreover, there is a list of 50 open source load testing tools on opensourcetesting.com and some, like JMeter and OpenSTA, are pretty mature. Yes, open source in load testing area was not so successful as in other areas. Especially analysis is weak in most of these tools (if existent at all).

Second, releasing is just the first step. The challenge for SOASTA would be how they support a large number of non-paying users (although, of course, for a promising start-up the number of customers may be important by itself). The community maybe can help with “how-to” questions, but implementing, let’s say, enhancement requests is up to the SOASTA team. And the number of such requests may be pretty high as people start to use it with different applications.

For example, it looks like we can’t specify transactions during recording in CloudTest for the moment. Well, what I am supposed to do with a script with a few hundred identical requests in it (AJAX type, differ by incomprehensible http body content)? Track delays in the scripts and try to correlate them with recording steps? Not exactly my understanding of quick and easy.

Scott writes in his review “it is free from now until the sun explodes” Hmm… I’d rather prefer to hear this from the SOASTA team. Well, even if the SOASTA team is completely devoted to this edition, nobody can guarantee that SOASTA won’t be acquired and who knows what acquirer decides to do with the freemium edition…

Yes, Scott is not easily getting excited. Last time, as far as I remember, Scott got excited about a load testing tool when Microsoft released their tool as part of Visual Studio back in 2005. See, for example, the discussions around my old posts VisualStudio 2005 and Load Testing and Scripting Language in Performance Tools. Well, Microsoft didn’t live up to its promises and I haven’t heard about their load testing tools for a while (my understanding is that it is not dead, but doesn’t play any noticeable role). But who knew that Microsoft was losing its grip?

By the way, returning to this old post about scripting languages, I don’t object the idea of GUI-based load testing tools (I mean when script is represented by a kind of graphical tree or something like this) if we can extend it with code (as in CloudTest with JavaScript) or switch between tree and script view (as in Oracle Application Testing Suite). My concern was (and is) if you can’t extend you recording with code at all – I still believe that it limits the area of application significantly.

Anyway, it looks like we have several interesting developments in the load testing tool market that may be beneficial to the community. CloudTest and its Lite version are definitely on the list. LoadRunner AJAX True client may be introducing a new paradigm in load testing (or promoting it if follow e-Valid blog). Oracle Application Testing Suite (former Empirix) is practically a new product and is getting traction [at least in the Oracle Universe].

Do we have a revolution in load test tools?

May 9th, 2011 8 comments
Share

It looks like the competition on the load testing tools market heats up again. And, of course, new players need to differentiate themselves. I started to see statements that zero-scripting tools is a new word here. See, for example, the Agile Thinking: A New Approach to Performance Testing paper by Graham Parsons.

While I completely agree with the first part (compare with my Agile Performance Testing article), I am rather confused with the second part. I don’t see why we can’t use well-established tools like LoadRunner or JMeter (if we speak about free tools) in agile development. When you know the tool, scripting is usually not the most difficult part of your performance project.

The zero-scripting approach (if I understand it correctly) looks for me rather limited to simple web sites (and if you have such sites – you don’t need to do much in LoadRunner either). How do you handle complex correlation and parametrization often required by today’s rich web clients? And as you start to introduce ways to do it through, for example, graphical interface – you start to create a proprietary way to do this.

Also there is very interesting Fred Beringer’s post. I completely agree the main idea of the post. Performance engineering is much more than just a gate-keeping function. The terminology is somewhat vague here, but I usually use ‘performance engineering’ when I want to stress that it is more than “gate-keeping”. ‘Wisdom’ sounds a little bit too pathetic to me and the definitions of knowledge and wisdom may be argued about, but the idea is well formulated and definitely valid.

But I still believe that you may be in the ‘wisdom’ business with whatever tools are available to you. I, of course, don’t know all the features of CloudTest (the materials I read were too marketing-oriented to figure out what are real advantages). Tools help you a lot, but still it is people who are in knowledge / wisdom business, not tools. Even the best tool won’t do the work for you – you need to know how to use it, how to interpret it, etc.

And I still don’t see that drastic difference with “traditional” load testing tools. I guess CloudTest has some advantages. Listening to a webinar I realized one – they can quickly and automatically deploy agents across as many cloud servers as needed. Well, this is a significant advantage when you work in the cloud or want to run a really large-scale test (and an interesting variable here is how many users you can simulate from one server). Probably there are other advantages. But the basic principles still look the same.

Going to how Fred describes the difference:

1. Scale. Maybe CloudTest has some advantages on the high-end. It usually is a problem when you want to simulate many thousands users. But why “traditional tools limit themselves to testing behind a firewall”? Nothing, up to my knowledge, prevents you from installing them outside firewall. Yes, most performance testing in large corporations is done in a lab inside firewall – but, I guess, it is because most performance testing they do is for internal applications. And nothing actually prevents you from generating traffic from different geographies. I often did it myself by installing LoadRunner agents in remote locations.

I am not sure that I got Fred’s point in the statement: Most traditional tools, including Loadrunner, use manual dynamic session or hard-coded rules. You end up with significant repetitive work and long support delays. . At least what I saw about templates during a CloudTest webinar, looks pretty similar to what is used in “traditional” tools.

2. Speed. Why “proprietary scripting language”? LoadRunner uses ANSI C for Web protocol (and other standard languages for other protocols), Oracle Application Testing Suite uses Java, etc. Sure, they use a lot of proprietary functions there, but languages are standards and you can add your code or use external functions when you need to extend script functionality. Yes, it requires some skills – but this is required only in really complicated cases. In simple cases, you don’t care much about what language is used in the script – just record it and play it back. Anyway, looks like SOASTA uses JavaScript as a scripting language.

3. Affordability. Well, LoadRunner is not the cheapest product on the market for sure. But whatever licensing model is used by CloudTest, it is probably still a noticeable sum. While we have multiple free products around. Some, like JMeter, are mature enough.

To summarize, it quite could be that new tools have some advantages over old players. Still I don’t see a revolution yet. Some progress for sure (especially after a long period of stagnation in this market), but not a revolution. Well, maybe I just not good in marketing – but I wish to see some webinars / papers for technical audience explaining real difference and advantages.