Having worked on optimizing multiple environments and applications over the years, the main challenge our users face is how to properly prepare a load test for the purpose of tuning their applications. In order to make quality recommendations for system settings that maximize performance, careful preparation and planning of a load test are required. Whether tunable settings are chosen manually or are discovered by an automated system, as with Concertio’s Optimizer Studio, this step is essential. To ensure that the performance test results on which to base recommendations are valid, it is crucial that the test environment and test workload are representative of what would be seen in production, and that the procedures for running the test are repeatable, auditable, and traceable. Relying on eyeball statistics just won’t cut it (see Prof. Emery Berger’s wonderful discussion about eyeball statistics).
Steps for preparing the test:
1. Define the target of the optimization: Prior to performing any kind of optimization, the target metrics to be optimized should be chosen. These must be directly related to the needs of the business and the domain of the application in order to ensure a smooth and successful operation of the system. For instance, a common optimization target for batch jobs is to minimize their wall-clock time. The metrics may also include statistics related to response times, throughputs, or the cost of using combinations of hardware and configuration settings. In doing so, one must ensure that collecting the measurements will not degrade system performance.
2. Prepare representative input data: Application test data that is representative of the production workload both in terms of semantics (i.e. the operations performed on it that depend on data values) and the size of the data set, must be provided. For example, a test database should contain a number and mix of record types that are similar to those in production.
3. Apply a representative load: Make sure that the load generated during the test is representative of what would be seen in production. It should be possible to increase or decrease the load without having to change any tunable environmental settings, and without having to change the test data with which the system has been populated before the test.
4. Mitigate the variability: During the test, the target environment should be free of activities unrelated to the system under test. The reason for this is that performance measurements could be contaminated by resource usage of unrelated activities in the target environment or the network used to access it. For example, a cron job that wakes up in the middle of a test to back up the disk might interfere with the test.
5. Identify the valid tunable combinations: Ensure that the ranges of combinations are identified to allow for adequate coverage of possible settings as well as for avoidance of undesirable settings. For example, a graphical processing unit (GPU) should be configured so that the total memory allocated among all GPU threads does not exceed its total physical memory. For load balancers, incorrectly skewing the routing of transactions, configuring too few access channels, or using sticky load balancing for very short transactions can all adversely affect performance (both when done separately or in combination). Such known undesirable configurations can be identified prior to starting the optimization process in order to speed it up. For a discussion on load balancers, please see Andre Bondi’s talk.
6. Prepare a representative environment: The versions of the compilers, interpreters, operating systems, hardware, and other software in the test environment should correspond to those used in the production system. For instance, a Java virtual machine used in a test should have the same properties as the one intended for production. Using a Java virtual machine older than Java 8, results in poor exploitation of multiple cores, and hence poorer performance of multithreaded systems.
7. Perform sanity checks on the instrumentation: It is important to validate the measurement instrumentation and data sources, as well as perform sanity and consistency checks on them before every test. For instance, on older Windows systems the reported disk utilization would sometimes exceed 100%. That rendered this signal unreliable for the purpose of performance tuning.
Workload generation and the generation of performance data should be done automatically to ensure repeatability of the experiments and to ensure that deviation from the representative workload is small. To be sure that the settings in place are what the tester thought they were, the outputs of the tests should contain printouts of the actual values of the tunables. Setting combinations of tunables with a script just before each test run will help ensure that their values are as intended, while enabling verifiable repeatability of experiments. This is especially important when kernel and other tunable settings might be reset to default values whenever the system reboots.
Variability and Measurement
Inferences about average performance and resource usage measurements are stronger when the variability of the measurements due to extraneous factors is minimized. However, it is inevitable that there will be some variability in real systems. One way to reduce variability is to reduce the interference due to factors outside of it. Then, we can focus on the effects of the tunables and the system behavior on the performance metrics. The following are examples of sources of variance and ways to reduce or avoid impacts of extraneous factors:
1. During the time it takes for system performance to level off after the test load is applied, the values of average resource utilizations and average performance measures such as response times are likely to be lower than when the system has settled down. Therefore, data collected during this ramp-up period such as averages, standard deviations, maxima, and minima, should be excluded from statistics about system performance. Data collected during test preparation, the ramp-down phase, and the post-test cleanup should also be excluded from the statistics. Including data from these periods is like mixing data from different populations. Mixing data prevents sound inferences from being made about system performance and may increase the variance of the observations.
2. Variability due to environmental noise should be avoided. Noise can come from the presence of unrelated activity in the virtual or physical machines or from extraneous guest virtual machines running on the same bare metal hosts on which the test is being run. To minimize the risk of noise, one may wish to run the application under test on dedicated bare metal. Load generators should never be run on the same hosts as the system under test, because each might contend for resources needed by the other.
3. To minimize variability due to network contention by activities unrelated to the tests, load generation should be run on hosts that are a few network hops away from and physically close to the target system. The concerns about variability for the application also apply to the load generators.
4. Background activities such as maintenance processes should be disabled during the load test, unless they are an integral part of the application that would be running in production. If they are always running in production, identical tests should be run with them being enabled and disabled.
Identifying time intervals in which the system performance and resource usage have leveled to more or less constant values after the load has become steady, is essential to making inferences about system performance and consequent recommendations about tunable settings. During such intervals, graphs of performance measures vs. time will be approximately horizontal and should not vary by much. The intervals should be long enough to smooth out the effects of random noise on the average measurements. Graphs showing high variability, trends, or oscillations, indicate system instability. Trends could be due to memory leaks or an increasing backlog because at least one resource is saturated or not being freed when no longer needed; oscillations could be due to deadlocks, thread safety issues, or other causes. Long test periods at constant load with fixed settings, allow for more observations of the performance and resource usage measures to be collected, and therefore enable the smoothing of the effects of random noise in the test environment. The inevitability of variability in measurements is discussed in detail in this blog post.
Optimizer Studio can be used to assess the stability of the load test. This is useful when preparing the load test and attempting to reduce its variability as much as possible. This step serves as an acceptance test before meaningful results can be obtained through optimization (whether performing manual or automatic optimization). After the load test is ready, Optimizer Studio can be used to specify which tunable settings should be optimized. Users can choose from the many supported tunables, or specify their own. After the tunables are selected, Optimizer Studio automates the time-consuming and error-prone iterative tuning process of configure-run-evaluate-repeat, and provides trustable results that were obtained using a scientific optimization process.
For more information on Concertio’s Optimizer Studio, please click here: https://concertio.com/optimizer-studio/.