See how MongoDB can be tuned to achieve a 2.36x speedup in performance
Computerized databases have come a long way since their introduction in the 1960s. Going from kilobytes to petabytes of data in a half of a century, databases have evolved from navigational, to relational, and in recent years, to NoSQL databases. They’re beautiful – and they’re complicated. This is why good DBAs are never easy to find.
Applications’ performance usually relies to a great extent on their underlying databases, so choosing a suitable database and tailoring its configuration to the specific application needs is crucial. But even for the most seasoned experts it’s a struggle. This is because of the high complexity of databases and applications, the ever-increasing number of tunables, and the numerous levels of freedom in the design space.
All too often, many database tunables are left at their default values, which aren’t optimized for specialized usage. This is because their optimization is time-consuming and requires talent and focus which is not always in-hand. Sometimes, it’s possible to compensate for the lagging performance by spending more on infrastructure (scaling up and out), but many times it becomes technically impossible to improve throughput without a major design change in the code.
The question is – can this effort can be automated and still yield good results?
The MongoDB Experiment
MongoDB is one of most popular NoSQL document databases these days. It’s used by Fortune 500 companies and startups alike to deploy, monitor, and manage their data. The diversity of possible applications using MongoDB as their data store naturally drives the company to expose many tunables to its users. So we teamed up with our friends at MongoDB to see how we can bring an automated database optimization capability to their users. Our common goal: finding out whether an automated approach can lead to better and faster performance compared to a manual approach by an expert. It’s AI vs. human experts – the race was on!
The Nitty-Gritty of the Experiment
As Alexander Komyagin, Senior Consulting Engineer at MongoDB explains: “MongoDB’s WiredTiger storage engine allows a single instance to process thousands of read and write operations per second. Achieving such high throughput requires sophisticated database cache management policies with different tunables. These MongoDB tunables, however, are often kept at their defaults owing to limited understanding of their nature and difficulties predicting the exact effects. In this experiment, we used a purposefully constructed mixed load that with the default WiredTiger cache settings creates a disk IO bottleneck and causes overall throughput of the system to drop.”
Our setup comprised of a server and a client:
- Server (c5.large on AWS):
- Ubuntu 16.04
- MongoDB version 3.4.10
- Optimizer Studio version 1.12
- A script that prepares the testing environment, applies the tunable (knob) values, invokes the load script on the client server, collects the results, and feeds them back to Optimizer Studio
- Client (c5.large on AWS):
- A script that generates load on the server
Optimizer Studio was configured for synchronous sampling mode, which means that it iteratively runs workloads to completion while alternating the configurations between each run. The workload script on the server first restarts the database, applies the new knob values, invokes the load on the client machine, and writes the result of the load to a temporary file. The source code is detailed below:
The source of the load script on the client machine is detailed below:
The load script on the client machine runs two parallel MongoDB scripts, one for updates (writes) and one for queries (reads). The source code of the query script is detailed below:
The source code of the update script is detailed below:
Following is the Optimizer Studio configuration file contents, which includes the definition of the mongoDB tunables (“knobs”), the target metric, and the global settings of the optimization process:
The default settings of MongoDB are detailed below:
One pesky issue performance engineers need to deal with in real system is variability. We’ve recently blogged about this here. In our experiment, we found that the variance of the target metric (MongoDB query rate) was high, so we increased the number of required measurements in baseline from 2 to 7 and for other knob configurations from 1 to 3. This proved to be sufficient to get statistically significant results.
Finally we ran Optimizer Studio and the console output is detailed below:
The optimization took 9 hours and 9 minutes, during which the optimized performance improved as the time progressed:
What we found: Optimizer Studio has successfully tuned MongoDB tunables to perform ~136% better (=2.36x speedup) on average Vs. the baseline settings, for a synthetic read+update load. How does that fare against an expert? The recommended settings of the MongoDB performance team for this experiment were syncdelay=120, trigger=50, target=50. The raw performance is summarized in the following table:
|Baseline settings (mean)||Expert recommendations (mean)||Optimizer Studio recommendations (mean)|
|242,200||531,534 (2.19x)||572,000 (2.36x)|
Optimizer Studio was able to improve by a further 8% over the expert recommendations, or a further 17% in baseline performance, all automatically.
Alexander Komyagin, Senior Consulting Engineer at MongoDB summarizes the experiment: “As this experiment shows, Optimizer Studio can automatically optimize MongoDB tunables and exceed the expert recommendations.”
AI and Machine Learning is Changing Database Optimization
With Concertio’s Optimizer Studio tool, DBAs can easily optimize their specific systems and boost their performance. This approach works particularly well with technologies such as MongoDB that employ open APIs to change parameters on the fly and make real-time configuration adjustments.
There are many advantages to incorporating machine learning methods for optimizing databases, among them:
- Greater database performance than can be achieved by manual methods
- Quicker optimization process that helps shorten time to market
- Increased cost-effectiveness of valuable engineering and IT resources
- For companies without a database performance team: reaping the benefits of performance optimization without having to build a new team
- For companies with a database performance team: automating and streamlining the parameter optimization process while focusing on higher level optimizations