Optimizing NGINX is challenging, let alone coupling it with resource sizing in Kubernetes. The solution? Automated tuning! This blog post is based on a webinar that was co-hosted by Shashi Raina, a Partner Solution Architect at AWS, and Tomer Morad, Co-Founder and CEO of Concertio. It is divided into three parts (click here for the first part). In this second part, we dive into automatic optimization and how to optimize an NGINX website that is hosted on Amazon’s EKS, or more specifically, Fargate. In the final part, we will dive into post-optimization steps to generate insights, as well as discuss integrating Continuous Optimization into the Continuous Delivery process.
Configuring Container Resources in Fargate
When deploying a container in Kubernetes, we must first define the resources we’d like to allocate to it. The CPU and memory resources that are allocated can have a very big effect on performance and costs. Usually, higher CPU and memory will result in higher performance and infrastructure costs. At some point, however, there are diminishing returns when increasing those resources. When optimizing for performance per dollar, selecting the right resources for each application is not trivial.
Fargate is a server-less product from AWS, allowing the deployment of containers to Amazon’s managed Kubernetes service, while paying only for the allocated resources. There are 50 possible configurations for choosing the CPU and memory resources for a single container in Fargate, as detailed in the Fargate pricing page and in the following table as of the time of writing:
CPU (vCPUs) | Memory Values |
0.25 | 0.5GB, 1GB, and 2GB |
0.5 | Min. 1GB and Max. 4GB, in 1GB increments |
1 | Min. 2GB and Max. 8GB, in 1GB increments |
2 | Min. 4GB and Max. 16GB, in 1GB increments |
4 | Min. 8GB and Max. 30GB, in 1GB increments |
Configuring numerous pods in Fargate can quickly become an unmanageable task. For example, if we need to configure five Fargate pods, there are 505 possibilities (300 billion). Thus, it would be practically impossible to evaluate each possible configuration.
Tuning the NGINX Configuration File
NGINX has a configuration file that embeds numerous tunables for us to play with – we’ve found 23 important tunables that are worthwhile to configure. Of the 23 tunables, 5 stood out in our analysis as the most influential on the metric we measured (performance/cost):
- keepalive_requets
- worker_rlimit_nofile
- open_file_cache
- open_file_cache_min_uses
- lingering_time
Overall, the 23 tunabes in NGINX that we found represent 2*1017 possible configurations. That’s one fifth of a billion billions. Perhaps an expert could prune the parameter space and try only a few combinations, but for most people who are not experts on NGINX, it is unmanageable. The number of possibilities to configure both the container resources and the NGINX configuration file is the product of each, which yields 1019 possible configurations.
Our target in this particular example is to maximize the number of requests per second per dollar. We want to find the most efficient configuration that will get us there. In order to achieve this, we’ll use the setup described in the diagram below:
In the setup, we have an NGINX pod running on EKS, another pod that is applying the load on Nginx (wrk), and Optimizer Studio that orchestrates the optimization process.
Optimizer Studio runs outside of EKS. When invoking Optimizer Studio, it will iteratively run the following:
- Select a configuration of resources and NGINX tunables to explore
- Apply the configuration
- Deploy or update the pods
- Apply the load using wrk
- Gather the results
The load is applied using wrk, which is an http benchmarking tool. After the load is applied, the pod exports the resulting metric into a Prometheus push gateway. The metric is then read back by Optimizer Studio.
Optimizer Studio uses a black-box optimization algorithm, so it does not need to know anything about the architecture of the setup. All it needs to know is which tunables it can change and how to read the resulting metric from the push gateway.
Why we need to optimize the full stack in concert
Below are the optimization results of various attempts to optimize the settings using Optimizer Studio:
Resources | nginx.conf | Requests per second | Requests per $ | |
Baseline | 1 vCPU, 2GB | Baseline | 11,937 | 2.75B |
Optimized NGINX | 1 vCPU, 2GB | Tuned | 12,036 (+1%) | 2.77B (+1%) |
Optimized Resources | 0.25vCPU, 0.5GB | Baseline | 8,230 (-31%) | 6.60B (+138%) |
Full Optimization | 0.25vCPU, 0.5GB | Tuned | 12,681 (+6%) | 10.16B (+272%) |
We first ran the baseline configuration, which is one virtual CPU and 2GB of memory, and got roughly 12,000 requests per second served by this pod, and each dollar gave us 2.75 billion requests.
It would be great if we can get more than 2.75 billion requests per dollar. Let’s start with optimizing NGINX. If we keep the resources constant as in the baseline, we can use Optimizer Studio to tune the NGINX configuration file only. The result is an increase of 1% in performance (requests per second) together with an increase of 1% in requests per dollar. While the improvement is in both metrics, it is not significant enough to move the needle.
Let’s try a different path: optimizing the resources. We optimized the resources and got optimal resources that were smaller than what we started with. While there were 31% fewer requests per second, in terms of requests per dollar, we now get 6.6 billion – an increase of 138%! Remember that increasing this metric was the goal of the optimization.
But what happens if you optimize both the resources and the NGINX configuration file together? When we optimized them together, we got an improvement of 6% in requests per second and a whopping 272% of improvement in requests per dollar.
The point here is that the resources and the configuration file of NGINX need to be optimized together. If you optimize each one separately and just combine the settings, you will not be able to get the best performance. When you consider all the tunables in the whole system together – that’s when you can really take advantage of the interactions between the tuneables and achieve the highest results. All of the components in the system need to play nicely with each other, and in concert, in order to achieve the best results.
The bottom line in this example is that the automatic optimization using Optimizer Studio improved the efficiency by 272%, and cut the cost by 73%.
In the next and final part of this blog, we will show how to graphically analyze the results and discuss the post-processing step called “Knob Refinement”. In addition, we will show how to implement Continuous Optimization in the Continuous Delivery process.