Skip to content

Workload Definition

Workload defined - simplest execution

The simplest way to define Optimizer Studio workload is by defining a workload script location in the knobs.yaml:

workload:
  kind: sync
  run:
    command: ./workload.sh

With the workload path defined, running only the optimization stage is the simplest track towards improvements.

$ optimizer-ctl init --knobs /path/to/knobs.yaml --stages=optimization,done
$ optimizer-ctl start

The responsibility of the workload script is to export its performance to the target metric, usually by storing its value in a file.
The target metric is sampled upon workload script completion.

Workload declarative definition

To better control integration of workload execution with Optimizer Studio, Concertio provides the workload declarative syntax in the experiment defintion file (knobs.yaml).
The workload definition contains a structured recipe of how Optimizer Studio should execute, control and monitor the given workload.
Many of the examples included with Optimizer Studio package demonstrate this approach.

Workload definition syntax explained

Below is a brief reference of the workload definition keywords:

  • kind: sync|async|accel - the workload invocation mode. Each kind is explained in more detail below.
  • run - the recipe for running the workload on each sample.
  • on_config_change - the preparational section, executed only on knob configuration change. The run section is still executed unconditionally. [optional]
  • command - the actual command executed. It can contain either direct shell commands or an external script call.
  • timeout - the sample will be invalidated in case the command does not complete within TIME (in HhMmSs format). When 0 or omitted, timeout is considered indefinite. [optional]
  • abort_condition - the sample will be invalidated in case the condition is met. The condition is tested periodically. [optional]
  • stop - the shell sequence used to stop the command. When omitted, kill -SIGKILL ${WORKLOAD_PID} will be used. [optional]
  • attachements - a list of files to be uploaded to the web server as part of the experiment inventory. [optional]

Sync workload (default)

The sync workload is the most often used kind of workload. The workload runs from the start to the end, and the target metric is sampled upon workload completion.

workload:
  kind: sync
  on_config_change:
    command: ./prepare.sh
    timeout: 1m30s
    stop: kill -SIGTERM ${WORKLOAD_PID}

  run:
    command: ./workload.sh
    timeout: 30s
    stop: kill -SIGTERM ${WORKLOAD_PID}

  attachements:
  - workload.sh
  - run.sh

Async workload

The async workload is a workload that must be up and running while the target metric is sampled.
Oftentimes the async workload is useful in networking workload, as the network traffic stops upon workoad completion.
In this case, we request to sample the target metric TIME after launching the workload - sample_after: HhMmSs. The workload is stopped upon sampling the target metric.

workload:
  kind: async
  on_config_change:
    command: ./prepare.sh
    timeout: 1m30s
    stop: kill -SIGTERM ${WORKLOAD_PID}

  run:
    command: iperf3 -c <IP address> -t 0
    sample_after: 3s
    stop: killall iperf3

  attachements:
  - workload.sh
  - run.sh

Accel workload

The accel workload is a workload run via accelerate utility. This workload is often executed in a production environment, and Optimizer Studio attempts a new knob configuration each time the workload restarts.

workload:
  kind: accel

Aborting a stray workload

It is possible that a workload, or its phase, does not complete in time, gets stuck or becomes invalid in some other way.
Upon identification of such a state, the workload will be aborted, and the sample marked as invalid.

Timeout

It is possible to define a timeout for on_config_change or run (sync workload only) phase of the workload definition.

workload:
  kind: sync
  on_config_change:
    command: ./prepare.sh
    timeout: 1m30s
    stop: kill -SIGTERM ${WORKLOAD_PID}

  run:
    command: ./workload.sh
    timeout: 30s
    stop: kill -SIGTERM ${WORKLOAD_PID}

Custom abort condition

It is possible to set up a user-defined abort_condition for on_config_change or run phase of the workload definition, that will be tested periodically. As long as the condition script outputs 0 the condition is not met, otherwise the workload will be aborted.

workload:
  kind: sync
  on_config_change:
    command: ./prepare.sh
    abort_condition:
      # abort if there is too little free memory left
      condition: "[ $(awk '/MemFree/ {print $2}' /proc/meminfo) -gt 10240 ] && echo 0 || echo 1"
      attempt_every: 3s
    stop: kill -SIGTERM ${WORKLOAD_PID}

  run:
    command: ./workload.sh
    abort_condition:
      # abort if a block device is removed
      condition: "[ -b /dev/loop22 ] && echo 0 || echo 1"
      attempt_every: 1m30s
    stop: kill -SIGTERM ${WORKLOAD_PID}

Accessing knobs and metrics via {{}} templates

It is possible to get access to knob values and some metrics via {{}} (Jinja-style) templates, e.g.

domain:
  common:
    knobs:
      A: ...
      B: ...
      C: ...
  ...

workload:
  kind: sync
  command: |
    echo "{{A}} + {{B}} + {{C}}" | bc -l > /tmp/ans
  timeout: "{{best.duration * 2}}"

Each {{knob-name}} is substituted with the current knob value.
In order to avoid ambiguity, it is also possible to prefix the knob name with "knob.", e.g. {{knob.A}}.
Each metric name has to be prefixed by "baseline." or "best.".
Probably the most popular usecase is setting the timeout based on previous experiment duration, e.g. timeout: "{{baseline.duration * 2}}".

Accessing multiple knobs at once

Oftentimes multiple knobs are required at once, e.g. when the knobs serve as compiler optimization flags:

workload:
  kind: sync
  on_config_change:
    command: |
      gcc {{get_knobs(".*")}} $STREAM.c -o $STREAM
  run:
    command: |
      $STREAM | awk '/Triad/ {print $2}' > /tmp/t_metric

The get_knobs(PATTERN) will be substituted by the values of all the knobs matching the pattern separated by a space.
The full format is get_knobs(PATTERN [, DELIMITER [, FORMAT]]), where

  • PATTERN - regular expression applied to the knob name
  • DELIMITER - a delimiter used to separate each two knob values [optional]
  • FORMAT - instead of substituting each knob with its value, we can employ a format recognizing %name and %value keywords. E.g. to export every knob as an environment variable, one could use: {{get_knobs(".*", "; ", "export %name=%value")}}. [optional]