To IT operators and developers, there may be a very noticeable distinction between a system responding slowly and one that is completely offline (“down”). However, to end-users, those two appear the same. One could argue that slow performance is worse than being offline because having something that works halfway is more frustrating than simply being told no. Think about the last time you were in a virtual meeting and the audio/video got choppy, and it stayed that way for an extended period - frustrating, right? Now, imagine how everyone would have responded differently had the service gone offline completely.
An unresponsive system is very black and white and thus can be easily measured – it is either offline or online. But performance is all about responsiveness and therefore introduces shades of grey into operations.
What exactly does it mean for a system to be “slow”? In the previous article, we discussed how proper observability operations help to surface system metrics by being intentional about what is being measured. Now that you are striving for operational excellence, you have the facilities in place to get the proper measurements to answer this question.
With those same facilities, you will also be able to glean insight into other facets of your workloads. The "facets" I am referring to are the tenets represented within a standard architecture framework supported by the major cloud platform providers. Each of those tenets helps to form the complete picture in determining the proper direction to take on your cloud journey. As discussed previously, the five tenets are: Cost Optimization, Operational Excellence, Performance Efficiency, Reliability, and Security.
In this third article of the series, we will focus on Performance Efficiency. We live in a world now where distributed applications with composable architectures are rapidly becoming the norm. My goal is to highlight the nuances of this new normal and how they can affect system performance. While the concepts presented are universal, the specific examples I lay out will be centerd around Microsoft's Azure cloud platform, mainly because that aligns with the breadth of my experience and recent assessments.
“Slow is the New Down”
This phrase has really taken hold in recent years. The responsiveness of your system is as important as the stability of it at worst, and more important at best. End-users have become conditioned to expect things on-demand and with immediacy in this digital world, thanks to Big Tech. According to Esteban Kolsky:
“55% of customers would pay more to guarantee a good experience while 67% cite bad experiences as the reason for churn.”
Those are some significant numbers. Even if your customers are tied up in Enterprise Agreements, or they are internal business users of a different department and cannot easily “leave,” it does not mean the perceived value of the system is not being diminished. Since I imagine you want to create a valuable system and positive experience, you must consider the efficacy of the system’s performance to avoid getting caught in that perceived value trap.
A Measure of Delay
Performance considerations take on new meaning when dealing with modern applications that are distributed by nature. While performance has always been a consideration, it can be different for a monolithic workload running on a beefy machine versus a more modular workload that is distributed across processes, networks, and even geographies.
When thinking about the modules of a workload working in concert with one another in those circumstances, the biggest thing that should jump out at you is the introduction of latency. Latency is a measure of delay, the time it takes for data to travel from a source to a destination across a network. When dealing with a monolithic workload, latency is a substantially smaller consideration because everything is ideally running within the same process or across processes on the same machine while sharing the same hardware resources. Making the workload more modular and composable introduces new latency considerations, but it also introduces new scaling opportunities.
Scale with a Scalpel
What happens when workloads get so overloaded that end-users start to experience performance “hiccups”? In the days of the monolith, one would just “throw more hardware at it,” meaning purchase a beefier machine and migrate everything, or purchase an additional machine with the same specs and run them both concurrently to share the load. Those were the only two options for scaling a monolith and either choice was a significant capital expense and operational investment. The composable nature of today’s modern applications allow operations to scale with surgical precision: only scaling the necessary components of the workload as opposed to the entire system.
Considering these factors will move you towards optimal performance efficiency because it is all about properly matching resource capacity to demand. Just increasing the capacity or the server count may solve the immediate performance issues, but it lacks efficiency as there tends to be wasted capacity. The aim is to analyze, identify and then optimize bottlenecks.
Food for Thought
Here are a few questions to get you thinking about how to handle your current environment or how you can set it up in a desirable way from the start (if you are considering the move):
- How are you ensuring that your workload is elastic and responsive to changes? Performance is an indication of a system’s responsiveness to execute any action within a given time interval. That responsiveness comes from having the appropriate policies in place to scale in and scale out horizontally, based on changing demand. Auto-scaling policies that don't respond fast enough to spikes in demand won't be of much value. To take your understanding of the workload's elasticity to the next level, you should measure the time it takes to scale in/out.
- How are you managing your data to handle scale? Data is the new gold rush and modern businesses are managing increasingly large volumes of it. But the issues lie within their varying characteristics and processing requirements as coming from numerous sources, such as the system itself, its users, or external services. You must understand the growth rate of your data so it can be properly stored and indexed for querying, but also leverage partitioning and retention policies.
- How are you testing to ensure that your workload can appropriately handle user load? Performance tests should not be ad hoc and randomly executed. First and foremost, make sure you have a defined testing strategy. Part of that strategy should be to perform those tests regularly but at any given time and make sure you are testing the appropriate components. Constantly load testing the entire system does not carry much value. Another part of your testing strategy should be to make sure you are using the appropriate tooling on the workload, including but not limited to, tools to occasionally inject faults so you can surface how a system breaks down under pressure.
These three things are far from exhaustive, but they should inspire you to approach your cloud operations a lot more thoroughly and with more confidence.
Interested in looking like a hero? Here are a few areas you can explore to increase the performance efficacy of your systems:
Configure basic autoscaling rules on compute resources. It is much easier to scale out/in (adjust instances, a.k.a. horizontal scaling) than scaling up/down (adjust capacity, a.k.a. vertical scaling). Autoscaling rules can get complex; therefore, a good place to start is with a simple schedule-based configuration that scales out during specific times of the day, and also scales in during other times based on the usage patterns of the users.
Leverage geo-redundancy to reduce latency between Azure resources and end-users. The advances in technology and hype around “the cloud” may have most people thinking what appears on their computer screen is at their fingertips, but the laws of physics do still exist. There will be a noticeable difference in the performance of an application for someone on the West Coast – while it is being served from the East Coast – versus someone already on the East Coast hitting that same product. Consider standing up another instance on the West Coast, as well as configuring a load balancer to distribute traffic based on geography.
Cache frequently accessed files and data closer to the end-user. Static files are ripe for a Content Delivery Network (CDN), who will duplicate those assets across many geographical regions and serve the user whichever copy they are closest. Applications can process operations faster if the data they depend on can be accessed immediately as opposed to having to query a database or read a file.
Proper observability will contribute significantly to surfacing performance bottlenecks. Performance efficiency is not something you reach; it is something you continuously monitor and tweak. If you are striving for operational excellence, you should have the proper foundation in place to monitor workloads and surface the bottlenecks.
Are systems consistently available?
Understanding not only how to measure properly but also how to interpret the meaning of those measurements to identify bottlenecks is paramount. Understanding how the latency between distributed components will affect performance, and how to mitigate those conditions is also key.
Some of you may be now wondering, what good is a high-performance engine that has been fine-tuned if the car can’t get out of the starting blocks? It is a legitimate question for anyone struggling to just keep their systems online consistently. Fret not, reliability is the next stop on this journey. In Part 4, we will discuss two key concepts to think about when considering reliability goals: availability and resiliency.
Reflecting on the questions posed and quick wins provided in this article, how much of this have you experienced already? If none of this was new to you – congratulations because you are well on your way to performance efficiency! However, if any of this was new, I challenge you to revisit your approach to performance, start to dig into understanding your bottlenecks, and take steps to mitigate them.