Successful performance engineering teams support their peers in identifying and solving performance challenges and in meeting their performance goals. The role of performance engineers is multidisciplinary, requiring an intimate understanding of the system, the load, and a multitude of monitoring and load-generation tools. In addition, performance engineers are usually required to communicate their findings with their peers and management in a concise way, even if the problems and solutions are complex.
As the performance engineering roles have evolved over the years, a number of best practices have emerged. These best practices, gathered from successful performance engineering teams, can improve the productivity of the teams so that they can achieve better performance results, reduce or eliminate bottlenecks, and do so quickly and cost-effectively.
1 Identify and quantify your performance goals
Before running performance experiments, you need to identify the performance goals of the system you are going to test, so that you know how to manage performance expectations and so that you can identify a performance-oriented “Definition of Done” before deploying your system.
2 Document your experiments
The environments on which experiments run have a big effect on the resulting performance metrics. Examples of these are hardware environments, software versions, compiler flags, and other settings that were used to obtain the results. Many companies gather and organize this information into folders, one for each test run for a family of experiments, and using numerous Excel spreadsheets. Others use commercial test management systems or a repository instead. Storing this information in a repository enables logging when information was entered and by whom. It also enables you to go back later and answer any questions about how performance data was collected. This traceability helps you justify performance-impacting decisions.
3 Automate data gathering
Documenting your environment for every experiment can be a daunting task. This is why successful performance engineering teams automate a large portion of the performance management environment. Apart from saving the time required to document, automation will be more accurate than an error-prone manual documentation process. For example, many teams run the Linux command “lshw” and store the result before each experiment, and automatically copy the logs of the runs to the experiment folders in a repository.
Not all data gathering can be automated. Things like the intent of the engineer and the performance Definition of Done need to be manually entered. It is important to do so to preserve the knowledge. Manual and automated recording also support traceability and compliance where required, while enabling verification that the measured environment is stable.
4 Plot everything
The human brain processes images much faster than text. Performance engineers will derive insights more quickly if they look at graphs when starting their analysis. Plotting more graphs of well-chosen metrics will increase the chances of generating insights. Automation plays a big part in plotting as well, because if graphs are plotted automatically, there is a lower chance that engineers will skip this crucial step.
5 Preserve your team knowledge
Centralizing the storage of all measurement data with an inventory of configuration data and analyses related to performance aspects of a system is key to ensuring that all members of the team are able access it to solve problems. You will find that it has the following benefits:
- Documenting performance expectations and goals gives you a performance Definition of Done.
- Documenting and storing the methodology underlying any performance experiment will help ensure that the experiment was or will be correctly conducted and the results interpreted accordingly.
- Doing so reduces the risk of unnecessarily performing experiments that you mistakenly believe have not been run before, while enabling the rerunning of past experiments under identical conditions to verify that their results are reproducible or that software changes have had a desirable effect.
- Centralization supports the traceability needed to ensure that performance-affecting steps can be revisited to analyze the impacts of system modifications and to meet compliance needs.
6 Embrace peer review
Peer review is a common practice in the software development world, and the most successful performance engineering teams embrace it. In software development, peers help with requirements review, architecture review, design review, code review, and various document reviews. In the performance engineering domains, peers evaluate the methodology behind the experiments, the gathered data, the insights, and in general look for errors. Peer review is needed to make sound design decisions and prevent performance concerns from becoming problems.
7 Automate the experiments
Time is scarce and performance engineering teams are very busy. As with functional testing, automating performance tests and tuning activities are a must. They ensure the repeatability of the experiments. There are classes of performance engineering experiments that can be completely automated, such as load tests, stress tests, and even configuration tuning. Automation of performance tests eases their inclusion in the Continuous Integration/Continuous Deployment (CI/CD) pipelines so that they are always run for every commit or major release. The automated system should only raise a flag when there is a problem requiring the intervention of the DevOps team or the performance engineering team, or both.
Using a system to manage experiments
Concertio’s Optimizer Studio and its experiment management system have features that can boost the productivity of performance engineering teams by supporting the automation we have described. Specifically:
- All experimental data, including the measurement data and system inventory, are gathered and plotted automatically for each sampled combination of configuration settings.
- Experiments can be grouped into projects. This makes it easy to search for historical measurements.
- Projects can be shared with team members and other stakeholders, so peer review can happen as the experimental results come in. This supports the vetting of on-the-spot changes of the experimental plan by stakeholders if needed.
- Optimizer Studio uses machine learning to automate modifications of tunables in response to changes in the observed values of performance and resource measurements. This step may be added to the CI/CD pipeline. This step is called Continuous Optimization. Automating this optimization frees the performance engineering teams to focus on more complex issues like the root-cause analysis of performance issues.
- In addition, Optimizer Studio uncovers non-obvious system configurations and improves overall system performance, reducing the burden on the teams to improve performance further. It may even identify and propose resolutions of performance bottlenecks without further human intervention.
These practices and Optimizer Studio reinforce each other and help ensure that your performance engineering process is on affirm foundation. If you adopt these together with the performance testing practices described in this Concertio blog post on performance testing, your performance efforts will be on a very sound footing.
This blog was prepared together with Dr. Andre Bondi from Software Performance and Scalability Consulting, LLC.