One of the ways to improve code performance is to compile with the right flags. After all, compiler engineers have invested years and years into making compilers better, and in the process, they’ve added many flags that we can use. However, in order to achieve optimal performance, you need to understand tens if not hundreds of these flags that interact with each other – not a task for the faint hearted. In many cases, you also need to understand the architecture of the target system and apply some computer architecture know-how.
What is Compiler Flag Mining
Simply put, compiler flag mining is the process of matching compiler flags with the application and the computing system it runs on to achieve the best performance. Things can get complicated quickly, as there are certain compiler flags that are unique to particular hardware implementations. For example, there are flags that specify how to handle floating point operations, but unfortunately, every hardware implementation might perform very differently for those flags. For instance, if you tune for the release 5 of a design and release 6 comes along, the same flags may not work as well. Other types of flags are more application dependent. One flag may result in significant improvements in one application but may be detrimental to performance in another. Although compiler engineers do a great job in selecting well-performing flags across architectures and applications in their “-Ox” flags, there is no real “one-size-fits-all” approach to achieve the best performance in every situation. If you need more performance than the default heuristic flags, then compiler flag mining can help you get an additional 5%-20% of performance, depending on the application and architecture.
The recipe for mining compiler flags
- If possible, upgrade your compiler. Usually, newer versions of the same compiler perform better.
- Automate your functional tests. Especially if you have recently upgraded your compiler, it is important that you have a way to test whether your binary still works.
- Automate your performance tests. Eventually, you will want to optimize for one or two metrics.
- Use Optimizer Studio to select a large number of flags to explore. You never know which flags might help improve the code speed, and Optimizer Studio can easily handle hundreds of tunables.
- Use Optimizer Studio to automatically choose the best performing flag configuration for your code.
Continuous Optimization of Compiler Flags
Continuous Optimization means the integration of optimization within an automated flow like Continuous Integration (CI) or Continuous Delivery (CD). In the context of compiler flag mining, the natural place to integrate is within the CI process as in this diagram:
In a typical CI flow, you commit your new code, it gets built and tested, and then it proceeds to deployment. When implementing Continuous Optimization, the flow splits prior to deployment. In this case, Optimizer Studio selects certain configurations, recompiles the code, and performs the functional and performance tests. When the optimal flag search completes, which happens either after a time limit or when performance cannot be further improved, the achieved speedup is compared with a user-defined threshold. If this threshold is reached, for example, above 1% improvement, the optimal flags are committed back to the code repository. This way, software engineers can use the new flags in their environment, and these flags will be used in subsequent deployments.
This above description is of a typical incremental continuous optimization flow, but many other variants exist, such as synchronous optimization (delayed deployment until the best performance is reached).
Because of the way optimization algorithms work, you may see a large number of compiler flags resulting from the optimization process. Many of these flags might not have a real effect on performance, but it can be time-consuming to test them all. One useful feature of Optimizer Studio in this case is called “Knob Refinement”. Optimizer Studio will take a minimal number of new measurements, in order to grant each knob a score and arrange them in descending order of impact. It then becomes very easy to not only automatically trim the flags that don’t have an effect, but to also generate insights on which flags were most impactful.
Industries that leverage compiler flag mining
Compiler flag mining is a common practice in performance engineering teams across many domains, such as financial institutions, high-frequency traders, semiconductor companies, telecom companies and others. Marvell, for instance, used Optimizer Studio to optimize the CoreMark benchmark, and as a result achieved 10.5% improvement using the tool over the “-Ofast” flag. When optimizing manually, their 10’s of hours of efforts resulted in only 7% improvement. This is why an automated approach to compiler flag mining is a no brainer, as it not only saves time, but it also achieves greater speedups.