Archive

Archive for the ‘Performance Engineering’ Category

User Concurrency

April 10th, 2013 No comments
Share

Performance testing terminology is not well defined and one of the most ambiguous terms is user concurrency. Re-reading Load Testing: Concurrent Users verses Simultaneous Users by Scott Moore (@loadtester) and LoadRunner Concurrency video by Mark Tomlinson (@mtomlins) inspired me to post this comment. Here is what I wrote (and still believe in it) in my old CMG paper about performance requirements in 2007 ( the latest version of this paper was presented again at CMG’12):

Concurrency is the number of simultaneous users or threads. It is important too: connected, but inactive users still hold some resources. For example, the requirement may be to support up to 300 active users.

When we speak about the number of users, the terminology is somewhat vague. Usually three metrics are used:

• Total or named users: all registered or potential users. That is a metric of data the system works with. It also indicates the upper potential limit of concurrency.

• Active or concurrent users: users logged in at a specific moment of time. That one is the real measure of concurrency in the sense it is used here.

• Really concurrent: users actually running requests at the same time. While that metric looks appealing and is used quite often, it is almost impossible to measure and rather confusing: the number of “really concurrent” requests depends on the processing time for this request. For example, let’s assume that we got a requirement to support up to 20 “concurrent” users. If one request takes 10 sec, 20 “concurrent” requests mean throughput of 120 requests per minute. But here we get an absurd situation that if we improve processing time from 10 to 1 second and keep the same throughput, we miss our requirement because we have only 2 “concurrent” users. To support 20 “concurrent” users with 1 second response time we really need to increase throughput 10 times to 1,200 requests per minute.

It is important to understand what users you are speaking about: the difference between each of these three metrics for some systems may be drastic. Of course, it heavily depends on the nature of the system.

The number of online users (the number of parallel session) looks like the best metric for concurrency (complementing throughput and response time requirements).

To summarize my comment, I believe that the number of “really concurrent” users is not an appropriate input metric for performance engineering and performance testing. It perhaps may be an output metric characterizing system’s load if we find a way to measure it (in a way, it is the number of users in the system if we use queuing theory terminology).

Performance vs. Scalability

April 5th, 2013 No comments
Share

After attending Sergey Chernyshev’s (@sergeyche) Scalability vs. Performance presentation at NY Web Performance Meetup and reading Scalability: it’s the question that drives us by Robert David Graham (@ErrataRob) and Scalability vs. Performance: it isn’t a battle by Theo Schlossnagle (@postwait) I would like to share my understanding. While I agree in general with everything said, I would rather word it differently. The topic became loaded, so accents are important. Robert, for example, states that “performance” and “scalability” are orthogonal problems. Well, no, they are not. They are different, but correlated notions. Even leaving aside that performance and scalability are somewhat vague terms.

If we speak about web systems now, it looks like we can roughly separate two main components in response time (which is the main performance metric): backend (server-side) time and frontend (network and client-side time). There are subtleties and grey areas, but I’d ignore them here. The frontend time, the subject of Web Performance Optimization (WPO), doesn’t relate to scalability as far as it is not involving server processing (again ignoring subtleties).

The proportion of frontend time vs. backend time may be any at all. According to Steve Souders (@souders), the founder of WPO, 80-90% of the end-user response time is spent on the frontend. But even for major web sites the backend time for requests involving database processing (such as submitting orders or querying order status) may be more noticeable. And there are plenty of corporate web applications where the share of frontend time is rather small. Of course, the starting point of any performance troubleshooting is to find where time is spent. And there is absolutely no sense to optimize parts where time is not spent.

However, there is one important “but”. While front-end time supposed to be constant (another simplification, but again ignoring subtleties), the backend (server) time depends on load. The heavier is load, the larger may be server time. And at some point it may skyrocket making the system practically unusable. So thinking about what you need to optimize, you need to check where time is spend under maximal load. Unless you don’t care about downtime and user experience, the way to do it is load testing.

And here we get to scalability. The frontend performance indeed doesn’t matter here and is independent of scalability. But the backend performance is directly related to scalability. The relationship, of course, may be non-linear and quite sophisticated – but it does exist.

To illustrate it, let’s consider one simple (but still typical) example. The backend processing takes X ms, the time is mainly spent in CPU and we don’t have any other bottlenecks. In this case the server response time would be mainly CPU processing time – and every request would take X ms of CPU time (if we don’t have parallelism here). As soon as we take most of available CPU time, server response time would skyrocket (that situation may be modeled using queuing theory). So there is a load when the system becomes practically unusable – and the question is just when we get to this load. We, of course, may get to problems sooner if we have any scalability problem inside the system or run out of another kind of resources.

Generally speaking, you can increase scalability by either optimizing server processing (using less resources) or providing more resources. Of course, if your architecture allows using these additional resources – so mainly scalability boils down to ability to parallelize your processing (and often limited by what you can’t fully parallelize – like a centralized database).

We do have two parts of response time – frontend and backend – which behaves differently and may need different approaches and tools to optimize. But the end user experience is the sum of these two parts – where the backend time is a function of load. You can’t say much about your end-to-end performance and its backend part until you check it under load – and load testing is the safe way to do so.

Historically performance engineering concentrated on backend – where main performance and scalability issues were – and practically ignored frontend (which indeed was usually pretty straightforward then). Several sub-disciplines were formed including performance analysis, capacity planning, and load testing. Later, when sophistication of frontend skyrocketed, a whole new discipline was established by Steve Souders and quickly grew around Velocity and Web Performance meetups. Unfortunately, it practically dismissed performance engineering developments of the last 40 years (maybe even more – the Computer Measurement Group (CMG) was founded in 1975). While frontend WPO definitely has its own specific, I’d still expect to see a holistic approach to performance engineering, taking in account all aspects of performance and scalability end-to-end.

Performance Engineering: Historical View

January 8th, 2013 No comments
Share

It is interesting to look how handling performance changed with time. Probably performance went beyond single-user profiling when mainframes started to support multiprogramming. It was mainly batch loads with sophisticated ways to schedule and ration consumed resources as well as pretty powerful OS-level instrumentation allowing to track down performance issues. The cost of mainframe resources was high, so there were capacity planners and performance analysts to optimize mainframe usage.

Then the paradigm changed to client-server and distributed systems. Available operating systems didn’t have almost any instrumentation and workload management capabilities, so load testing became almost only remedy in addition to system-level monitoring to handle multi-user performance. Deploying across multiple machines was more difficult and the cost of rollback was significant, especially for Commercial Of-The-Shelf (COTS) software which may be deployed by thousands of customers. Load testing became probably the main way to ensure performance of distributed systems and performance testing groups became the centers of performance-related activities in many organizations.

While cloud looks quite different from mainframes, there are many similarities between them, especially from the performance point of view. Such as availability of computer resources to be allocated, an easy way to evaluate the cost associated with these resources and implement chargeback, isolation of systems inside a larger pool of resources, easier ways to deploy a system and pull it back if needed without impacting other systems.

However there are notable differences and they make managing performance in cloud more challenging. First of all, there is no instrumentation on the OS level and even resource monitoring becomes less reliable. So all instrumentation should be on the application level. Second, systems are not completely isolated from the performance point of view and they could impact each other. And, of course, we mostly have multi-user interactive workloads which are difficult to predict and manage. That means that such performance risk mitigation approaches as APM, load testing, and capacity management are very important in cloud.

It is interesting that while performance is the result of all design and implementation details, performance engineering area remains very siloed. Those who do capacity planning are usually not involved much in performance testing or software performance engineering. The new and fastest growing group, web performance specialists, remains mainly isolated from other performance-related groups. People and organizations trying to span all performance-related activities together are few and far apart.

I don’t see that the need that need for specific performance-related expertise, such as load testing or capacity planning, is going away. Even in case of web operations, we would probably see load testing coming back as soon as systems become more complex and performance issues start to hurt business. There perhaps would be less need for “performance testers” as it was at the heyday due to better instrumenting, APM tools, continuous integration, resource availability, etc. – but I’d expect more need for performance experts who would be able to see the whole picture using all available tools and techniques.

Why do I believe that everybody interested in performance should come to CMG’12?

November 7th, 2012 No comments
Share

CMG’12 is an annual conference organized by Computer Management Group – a volunteer organization of professionals specialized in performance, capacity, and IT service management. This year it is held in Las Vegas, December 2-7, 2012.

Why I love CMG, spend a lot of my time organizing and promoting it, and coming there every year (sometimes on my own)? Well, because I believe that it is the best (and actually the only) conference on performance and capacity, the main topic of my interest for the last fifteen years. There are many conferences on specific topics. For example, the Velocity conference, devoted to web performance, is significantly larger and more popular – but it is still devoted mainly to single-use web performance, leaving all other performance and capacity questions to CMG. Let me share some of my excitement – of course, from my personal point of view (there is plenty of other highlights, but I am mentioning only the ones that are close to my heart).

This year the conference covers all aspects of performance (well, almost all – performance is so sophisticated subject that there is always much more to learn) from Web Performance Optimization (the conference opens by the keynote by Patrick Meenan, a web performance Google guru and the creator of WebPagetest) to mainframe performance (and everything in between).

The conference starts with a half-day workshops – see here the description. In addition to workshops, there are CMG-T sessions during the whole conference. Each CMG-T class spans 2 or 3 session spots, so it could easily be considered as a workshop or a training class. All led by renown experts with tons of experience, you hardly would get anybody even remotely close if you engage in a typical vendor class (not to mention a unique vendor-neutral or vendor-agnostic perspective you hardly find anywhere else). You have the CMG-T track through the whole conference and every one of them is a gem:

  • Capacity Planning by Ray Wicks
  • z/OS Basics by Glenn Anderson
  • Java Performance Analysis and Tuning by Peter Johnson
  • Model and Forecasting Basics by Dr. Michael Salsburg
  • Network Performance Management by Manoj Nambiar
  • Windows System Performance Management and Analysis by Jeffry Schwartz
  • Using SAS to Communicate Your Message by MP Welch

CMG’12 has 4 keynote/plenary session and almost a hundred regular track sessions going on from mid-Monday to mid-Friday. The conference is 5 tracks wide. One track, as I already mentioned, is CMG-T 101– type classes (with 301-depth). Others four tracks shared between five subject areas: Performance Engineering and Testing, Capacity Planning, Application Performance Management, IT Service Management, and Hot Topics. It is difficult to list all highlights – too many. While I know many great presenters and am fascinated by many topics, commenting every single one would take too much time and space. Probably you just need to look at agenda – there are three different views: preliminary agenda (overview, a day on a page), a list of abstracts in a single pdf document and search/scheduler (click on the abstract number to see the abstract).

One track on Wednesday is a Michelson award track. CMG is presenting Michelson award since 1974 (if you wonder, Albert Abraham Michelson was known for his technical accomplishments in measuring the speed of light and for his role as teacher and inspirer of others – and measuring is the key to performance). This year we will see many Michelson winners presenting: Dr. Connie Smith, the founder of Software Performance Engineering, Dr. Daniel Menasce, the author of many great books about performance and capacity planning, Adam Grummit, the author of the great Capacity Management book (ITSM Library) and the CMG president, Dr. Pat Artis, Bruce McNutt, and Dr. Michael Salsburg.

I believe that the main advantage of attending CMG is networking with best world experts in almost all areas of performance and capacity. Nowadays you can find all technical information on the Internet, but there is no substitution to face-to-face conferences to learn how to use it and what were people experiences, and, of course, to see the whole picture. Especially in performance: performance is the result of every design and implementation detail and you need to be learning all the time to keep up with coming challenges.

I am presenting there too: Load Testing: See a Bigger Picture on Thursday and
Performance Requirements: the Backbone of the Performance Engineering Process on Friday. Nothing comparing to other CMG’12 highlights, but I hope to trigger discussions around these two very important topics.

And, of course, it is Las Vegas – and Rio’s rate is $55 per night until November 14th. See you there!

CMG’12 Call for Papers and Workshops – The Best Independent Performance and Capacity Conference

May 18th, 2012 No comments
Share

The Computer Measurement Group (CMG) calls for papers and presentations for CMG’s 38th International Conference to be held in Las Vegas, Nevada, December 3rd through 7th, 2012.

The 2012 CMG conference will cover all areas of systems management, including but not limited to: capacity planning, IT service management, application performance management, performance engineering and testing, as well as the latest developments in the overall field of computer performance evaluation. See the Call for Papers and Call for Workshops for details.

CMG is the source of unbiased and objective expert information and practical, real life experiences across all computing platforms in the computer industry for over 30 years. Share your knowledge and experiences: write a paper and submit it for presentation at CMG’12.

Paper are categorized as Introductory, Tutorial, Advanced, or User Experience. I want to especially encourage all of you to consider writing a User Experience paper. Every year, the conference evaluations show a common theme: “More User Experience Papers, please!” You don’t need to be one of the field’s superstars to write one — in fact, they seem to work better from people who are just working in the field, in non-IT companies and government bodies. Just tell us what problem you faced, how you went about figuring out what the cause was, and how you dealt with it. Mentors are available for writing assistance, and may be requested at any point in the writing process, including before the paper is started. Just write mentor@cmg.org and ask.

Please take the time to participate in the CMG’12 program. It will be rewarding for both authors and attendees, and as we all share our knowledge we all become more complete professionals.
Paper submission through the CMG website is now available. For more information go to paper submission and workshop submission.

The deadline for paper submissions is June 8, 2012.

Please send questions to CMG’12 Program Chair, Bill Jouris at cmgpc@cmg.org.

Load Testing: Its Present and Future

April 26th, 2012 3 comments
Share

Recent trends of agile development, DevOps, Web and Social Media sites somewhat question importance of load testing. Some (not many) openly saying that they don’t need load testing, some still paying lip service to it – but just never get to it. In more traditional corporate world we still see performance testing groups and important systems usually get load tested before deployment.

Let’s first define load testing as far as terminology is rather vague here. I use it here as anything that requires applying multi-user synthetic load – in contrast with single-user performance (which is a subset of performance engineering and may include, for example, profiling or Web Performance Optimization as it is defined now). And I use it here as an umbrella term including all other variations of multi-user testing, such as performance, concurrency, stress, endurance, longevity, scalability, etc. – but you may replace it with any other term if you prefer.

Yes, it looks like some Web and Social Media sites managed to survive without load testing. However, it looks like many such companies match the following profile:
-Business is built around a single Web site, so everybody in the company follows what is going on in production.
-Overall architecture is still clear and relatively simple. Changes (however frequent) are rather minor and evolutional.
-There is decent instrumentation providing performance information.
-There is a possibility to remove changes relatively easy.
-Site downtime/a period of slow performance (until the problem would be noticed and fixed) is not extremely painful or dangerous to the business.

Load testing is a way to mitigate load- and performance-related risks. There are other approaches and techniques that also alleviate some performance risks:
-Good single-user performance engineering practices (single-user requests performance are constantly tracked).
-Good instrumentation/Application Performance Management providing insights in what is going on inside the system.
-[Auto] scalable architecture.
-Continuous integration allowing quickly deploy and remove changes.

Still all of these don’t completely replace load testing, but rather complement it. They definitely decrease performance risk comparing with situation when nothing was done about performance at all until the last moment before rolling out the system in production without any instrumentation at all, but it still leaves risks of crashing and performance degradation under multi-user load. And if the cost of it is high, you should do load testing (what exactly and how is another large topic – there is much more here than the stereotypical waterfall-like last-moment record-and-replay approach).

There is always a risk of crashing or performance issues under heavy load – and the only way to mitigate it is actually test it. Even stellar performance in production and highly scalable architecture don’t guarantee that it won’t crash with a slightly higher load. Truly speaking, even load testing doesn’t completely guarantee it (real-life workload may be different from what you have tested), but it drastically decreases the risk.

Another important value of load testing is making sure that changes don’t degrade multi-user performance. Unfortunately, better single-user performance doesn’t guarantee better multi-user performance. In many cases it improves multi-user performance too, but definitely not always. And the more complex system, the more probable exotic multi-user performance issues no one even thought of. And a way to ensure that you don’t have such issues is load testing.

When you do performance optimization, you need a reproducible way to evaluate the impact of changes on multi-user performance. The impact on multi-user performance probably won’t be proportional to what you see with single-user performance (even if it still would be somewhat correlated). Without multi-user testing the actual effect is difficult to quantify. The same with the issues happening only in specific cases that are difficult to troubleshoot and verify in production – using load testing can significantly simplify the process.

Summarizing, I don’t see that the need in load testing is going away. Even in case of Web and Social Media sites we would probably see load testing coming back as soon as systems become more complex and performance issues start to hurt business. Maybe it would be less need for “performance testers” as it was at the heyday due to better instrumenting, APM tools, continuous integration, etc. – but I’d expect more need for performance experts that would be able to see the whole picture using all available tools and techniques (although I don’t see it yet).

Performance Dimension of Information Technology

April 16th, 2012 1 comment
Share

There are no standards on titles and skill sets related to performance dimension of IT. I decided to put together how I understand them (most terms are vague, so it is quite possible that other people understand them differently). Of course, it is a simplification – but the topic is probably too heavy influenced by organization history and politics in every particular organization to be clear cut anyway.

I still think that we can break the whole area into three major categories: design (and development), testing, and production (maybe somewhat matching ITIL terms of Service Design, Service Transition, and Service Operation). The term Performance Engineering may be related to the whole area (or maybe related to the design category – in this case sometimes referred as Software Performance Engineering, SPE).

Performance Design. Talking about the design category (I used the ‘Performance Design’ term to group all performance-related activities during design and development , although it isn’t used this way – probably reflecting that the whole area is not quite existing as a separate discipline), we have specific areas of performance engineering knowledge for each specific technology. Such as Java performance, .Net performance, etc. One relatively new, but large and popular area is Web Performance Optimization, covering end-user Web performance. And, of course, we have Software Performance Engineering (SPE) trying to establish generic approaches – although SPE progress wasn’t too impressive since Dr. Connie Smith published ‘Performance Engineering of Software Systems’ in 1990.

It is definitely supposed to be an important part of the skill set of software architects (on a higher level, SPE, etc.) and software developers (maybe on a lower level, how efficiently design specific component using the chosen technology – but good understanding of high-level performance engineering won’t hurt either).

And while many architects and developers have some understanding of performance, often the main stress is on functionality and deadlines, so performance is left to the very end – where it sometimes may be indeed tuned in (usually when technologies are mature and the team is quite experienced), and sometimes require major changes (and late changes are very expensive).

It looks like the idea to have an explicit person responsible for performance from the beginning (starting from requirements) and working with other architects and developers to build it in makes sense. The title may be ‘performance architect’ or ‘performance champion’. Although such people are rare – rather we could see a proactive person from performance engineering or performance testing groups trying to ask performance questions early.

Performance Testing. Including, of course, all other variations and names, such as load, stress, endurance, etc. testing. ITIL matching term would probably Service Validation and Testing. All ways to apply synthetic load to the system and analyze system’s behavior. In the narrow sense, ‘performance tester’ is responsible for creating and applying such load (test scripting and execution). In a wider sense, it also includes workload characterization (workload modeling), performance analysis and performance troubleshooting – and often such person is referred as ‘performance engineer’. In some cases they are different people: performance tester is responsible for applying the load and performance engineer (maybe performance analyst in this case) is responsible for system analysis and optimization.

I definitely put performance testing in a separate category due to specific set of skills required: workload generation. And, perhaps, techniques to find and fix issues in the system applying an appropriate workload. But definitely not because “testing should go after development before production” as it use to be in the waterfall approach – testing should start as early as possible mostly overlapping with development and may continue in production. Monitoring the system using synthetic workload, for example, I’d rather also put in this testing category – it is actually testing the production system in parallel to production workload.

Performance Management, perhaps, may be a good name for the collection of performance-related activities and skills in production (and around).

It is interesting that ITIL places Capacity Management and Service Level Management processes into Service Design. I see a point here – you definitely need to allocate capacity before deploying the system, and Service Levels should probably come directly from the performance requirements. Still real people working in these areas are usually part of operations. Capacity Planners are responsible for allocating resources, although fewer and fewer people have such title and these responsibilities get spread between other groups (which, unfortunately, often don’t have appropriate skills).

Service Level Management would probably handled by Performance Monitoring (Analysis). ITIL matching term would probably Service Measurement. Title ‘Performance Analysts’ used often in the past – but not very popular anymore. Probably title ‘Performance Engineer’ is more popular now. And, of course, it may be specialized, like Database Monitoring, System Monitoring, Application Server Monitoring. These may be done by respective administrators (DBA, system administrator, etc.).

Application Monitoring – relatively new staff. Usually referred as Application Performance Monitoring. The idea is to measure application-specific metrics (including business-related metric, end-user metrics, etc.) in addition to those system-level metrics that used to be measured earlier. Importance of application monitoring is definitely growing. From one side, system-level metrics becomes less relevant in today’s infrastructure with virtualization, multi-tenancy, cloud, etc. From another side, the system becomes so complicated that trying to figure out what is going on using low-level metrics becomes nightmare. Form the third side, full monitoring from the business point of view becomes a business requirement – and it is where IT can provide unique business advantage.

Probably Application Performance Management (APM) would the right category encompassing most production-related categories such as Performance Monitoring, Capacity Management, Diagnostics (troubleshooting) and Tuning (and Optimization – although this may somewhat get into re-design category). We probably not there yet and Application Performance Management is rather a vague vision than reality. Gartner, for example, stresses that APM is Application Performance Monitoring, not Management. And I am not sure what would be a title of the person doing this. Management is a favorite word for an area of expertise (as in Performance Management or Capacity Management), but Manager (at least in the US) still means a person who manages other people. So the title, I guess, would be the same ubiquitous ‘performance engineer’.

Performance Troubleshooting or Diagnostics is definitely important part of Performance Management and is an application of performance engineering to existing performance issues. While it is probably the most typical performance-related activity at many corporations, very few have anything formal around it and usually all other performance-related groups get involved. And we need performance engineering kind of skills to investigate and fix performance problems in production.

It looks like that in the new generation of Web companies monitoring and capacity planning often included into ‘Site Reliability’, adding, I guess, some confusion to the already existing mess of terms and notions.

P.S. By the way, the only conference covering almost all topics mentioned above is CMG. Call for papers and workshops is opened now.

Performance Testing and Optimization for the Cloud

March 29th, 2012 2 comments
Share

While many companies promote performance testing in the cloud (or from the cloud), it makes sense only for certain types of performance testing. For example, it should work fine if we want to test how many users the system supports, would it crash under load of X users, how many servers we need to support Y users, etc., but are not too concerned with exact numbers or variability of results (or even want to see some real-life variability).

Even in this case it assumes that we don’t introduce any bottleneck using the cloud (for example, saturating network bandwidth between load generators and the system under test) and leave the cloud provider to care that our test doesn’t impact other cloud tenants (that may be not too trivial in the case of PaaS or SaaS).

However it doesn’t work for performance optimization, when we make a change in the system and want to see how it impacts performance. Testing in a cloud with other tenants intrinsically has some results variability as far as we don’t control other activities in the cloud and in most cases don’t know exact hardware configuration. For example, if the system scales out by automatic creation of an additional application instance, the new instance may be outside of the network segment where other servers are. The effects may be even more sophisticated in case of PaaS and SaaS.

So when we talk about performance optimization, we still need an isolated lab. And, if the target environment for the system is a cloud, it should be an isolated private cloud with all hardware and software infrastructure of the target cloud. And we need monitoring access to underlying hardware to see how the system maps to the hardware resources and if it works as expected (for example, testing scaling out or evaluating impacts to/from other tenants – which probably should be one more kind of performance testing to do). Real-world network emulators should be used to make sure that performance testing is representative of how the system would be used in production – otherwise we don’t taking into account such factors as network latency, bandwidth, jitter, etc. This means that we need a way to plug in the network emulation appliance properly.

So if we need optimization for cloud software, we still need a lab – but the lab should be more sophisticated to emulate the cloud environment and real-world network conditions. An ultimate example of such lab probably is the lab Microsoft created for testing IE.

So factoring in the cloud into performance testing, we have two alternatives: coarse performance testing in/from the cloud with inherent variability (and perhaps some savings on hardware and configuration costs) or granular performance testing and optimization in a sophisticated isolated lab emulating the cloud (thus avoiding variability with probably higher hardware and configuration costs).

The Main Performance Problem

March 22nd, 2012 3 comments
Share

Dennis Drogseth’s post The Many Dimensions of User Experience Management (UEM) is very indicative of the main problem we have in performance: people thinking about many small specific performances, but we have just one PERFORMANCE. It depends on many different components and manifests itself in many different ways, but any attempt to decompose it results in silos and losing some important parts of the whole.

From the post: When we asked “What is your primary driver?” Better application performance and triage came in fifth, with only 13% of the votes. Employee productivity topped the list at 23%, followed by business competitiveness and/or revenue at 20%. Better support for services delivered over the network came in third, and brand protection and customer satisfaction came in fourth.

Well. Ask , for example, business users about JVM performance and it probably won’t get into the first hundred of issues they care about. Does it prove anything? No. They care a lot about it if they use J2EE systems, but just don’t know about it (except maybe a few most curious).

“Employee productivity” heavily depends on application performance. ” Business competitiveness and/or revenue” is related to application performance. “Better support for services delivered over the network” – not sure what it means, but performance also comes to mind. “Customer satisfaction” – performance is a pretty major component. And even with “brand” quite may be impacted by bad performance. Probably business users (and not only business) don’t care much about performance when it is good, but as soon as performance degrades, it immediately jumps on the top of everybody’s priority list.

I, of course, don’t want to say that performance is the main thing in business – if you don’t have any business, you may not be concerned with performance. But as soon as you do, application performance would impact all parts of your business. But you notice it only when it is bad (and usually it will happen soon if you don’t take care).

Then the post says: Similarly, when we wanted to understand which organizations or groups within IT and the business were behind UEM or QoE, the Help Desk/End User Support came in first, Customer Experience Management came in second, and Applications Management and Network Operations were tied at third and fourth place.

And when asked which organization is likely to DRIVE the overall QOE/UEM initiative, the first five groups were: Line of Business, Customer Experience Management, Process Management and Compliance Professional, Help Desk, and Service Management.
Applications Management came in seventh, one percentage point after Infrastructure Management!

Yeah, exactly proves the point: there is no organization/group responsible for performance today. Not sure what “Application Management” is (I don’t recall seeing such group – app admins?). And it is not surprising that people don’t put such group to drive such initiative – I guess perception is that such groups are groups of IT geeks doing something with computers, not caring about business, and starting to do something only when would be told by CIO to fix it (that, unfortunately, often is close to the truth).

How it relates to concept of Application Performance Management (which is rather concept for the moment)? It just proves that it doesn’t exist in practice (at least in its ideal form). Usually there is no organization responsible for it (as holistic concept, in conjunction with business).

What are end-user response times (what EUM monitors)? They are external symptoms of application performance. The only part of application performance end users care about. The tip of the iceberg. If we are saying that we want to manage application performance, would end-user response times part of it? I have no doubt it would. Otherwise the whole concept doesn’t make sense.

The post states: User Experience Management also has strong business impact, governance, service level and user productivity implications that transcend performance management. Yes, performance has “business impact, governance, service level and user productivity implications”.

So the data provided in the post, by my opinion, proves two things: business cares about performance a lot, but there is no any reliable structure in place to care about end-to-end performance.

Actually I am rather confused by the term User Experience Management. I understand what it is User Experience Monitoring or End User Experience (which usually used in the context of measuring response times). But how would you manage it? You may manage your application/systems which would improve response times. Unless you just saying that you want to use the name User Experience Management as an umbrella name covering all related to performance (including APM, Capacity Planning, etc., etc.) – which maybe an option, but it doesn’t look like it is used this way. Or maybe User Experience Management is used as a wider term including usability, UX (User eXperience), etc., which usually relate to UI design? If yes, then it indeed includes important factors not related to performance and only partially overlaps with APM – but then I am not sure why we compare EUM with APM.

Ian Molyneaux’s post The Case for the CPO brought the topic of a person responsible for performance to its extreme. Great idea, but… How far are we from there? Forget CPO, but just having a person (or persons) responsible for end-to-end performance and building up the process assuring such performance? See job posting – have somebody seen any position saying that we need a person to drive performance in our organization (and meant it)? I haven’t. All positions are for a specific silo team or for consulting. So it looks like it would be awhile until we see a more holistic approach to performance (whatever name would be used for it).

Multiple Dimensions of Response Time

February 27th, 2012 No comments
Share

It looks like everything related to performance has multiple dimensions. Reading recently excellent posts A non-geeky guide to understanding performance measurement terms by Joshua Bixby and Building a High Performance Website by Phil Stanhope, I realized how many dimensions even a relatively simple term “response time” has. And, moreover, it looks like we don’t have a reliable way to measure the response time that would matter to end user (I guess something between “time to display” and “time to interactivity” depending on the site design, if follow the posts terminology). Both authors look at this rather from the front end / Web Performance Optimization (WPO) point of view.

Spending most of my time in performance testing, I’d guess that “response time” comes from load testing / active monitoring tools that are the main source of performance information (the “waterfall” approach of the WPO community quickly becomes popular – but I am not sure how many monitoring services use it). And in this case, “response time” is what the tool reports. What “response time” means in such case is heavily depends on the tool and its settings – and in many cases, I guess, it won’t be any of the metrics provided in the aforementioned posts (which, I guess, are standard in the WPO community – but they may be not easy to measure by load testing and enterprise monitoring tools). For protocol-based tools it would be probably the time of receiving all requests without any client-side activities (with many additional details of browser emulation- like caching, threading, keep-alive, compressing, etc.). For GUI-based tools it probably depends on what underlying mechanism the tool uses and how the script is designed. Quite often if you don’t set any specific checks it may report a success without full downloading and rendering (and when somebody say that a modern sophisticated site will load for 0.169 sec over the Internet it would be my first guess). Although, if scripted properly, it perhaps may measure the performance metric that matters (when the page would “be almost fully interactive”) by checking that the parts that matter are downloaded and rendered (that probably can’t be done without manual scripting / analysis).

That brings an interesting question about Application Performance Management (APM): what End-User Experience Monitoring (EUM) a.k.a. Real-User Monitoring (RUM) measures? EUM/RUM is considered as an integral part of APM (and definitely should be), but may measure pretty different things depending on the approach to measure it. And as I mentioned above, it probably won’t be the actual end-user experience – but only its approximation by another metric (different for different tools).

Only thing that often saves us from all this complexness is , as often happens in performance, that in many cases it doesn’t matter. All of the metrics are just close enough from the practical point of view. In old good times of plain html the main part was getting response from the server, the client-side part was fixed and usually small. So it wasn’t said much about different kinds of response times in the past. The situation is changing now: the front-end time becomes significant (see the Performance Golden Rule by Steve Souders, keeping in mind that it is based on front pages mainly) and now it looks like we can’t ignore the differences between response times anymore.