Skip to main content

Automated OS upgrades

This tutorial describes how to configure the system upgrade mechanism to trigger the OS upgrade at a designated day and time of the week and orchestrate the upgrade among different hosts. In this tutorial we use a Ubuntu/Debian site as an example. Avassa provides OS upgrade agents for a number of distributions. For the latest list see the readme in https://gitlab.com/avassa-public/os-upgrade.

A process to perform OS upgrades by hand can be found in Manual OS upgrade document.

In order to get an understanding of the OS upgrade architecture and mechanisms, read our OS upgrade fundamentals.

The overall procedure to use the Avassa OS upgrade functions are:

  • build the OS upgrade application for your system, we provide different upgrade applications for different OSes.
  • deploy the OS upgrade application to corresponding hosts
  • configure OS upgrade windows
  • monitor the automatic OS upgrades

Building the OS upgrade application implemented by Avassa

The procedure below applies to all supported distributions. Where the steps differ (notably the application specification), examples are provided for Debian/Ubuntu, Rocky Linux, and Fedora CoreOS.

Start by checking out the git repository https://gitlab.com/avassa-public/os-upgrade or downloading the files mentioned below as needed.

Create the approle

Create an approle and a corresponding approle policy. An approle is the authentication mechanism used by the applications to be able to use the Avassa APIs locally on the host they are running on.

cat yaml/approle.yaml | supctl create strongbox authentication approles
cat yaml/approle-policy.yaml | supctl create policy policies

approle policy

Look up the role-id

By default an approle authentication mechanism requires two secret parts: one part built into the image and one part is provided at runtime. The secret part to build into the image is called role-id and is generated when the approle is created. To look up the role-id run:

supctl show strongbox authentication approles os-upgrade --fields role-id
Example output
role-id: 4c6eb93b-4c3e-43ff-884c-273ff920ac24

Store the value (without the role-id: prefix) in a local file so that the docker build can read it as a secret:

Store role-id for docker build
echo '<replace me with role-id from above>' > role-id

Build and push the image

The role-id is built into a container image using the Dockerfile in the repository above. Replace the VERSION value with the target version to build. latest is fine for development purposes, but locking the application specification to a specific version enables more controlled application upgrades.

The list of published versions can be found at https://gitlab.com/avassa-public/os-upgrade/container_registry.

Build image with role-id
docker build \
-t registry.environment.name.avassa.net/avassa/os-upgrade:<VERSION> \
--secret id=approle,src=role-id \
--build-arg IMAGE=registry.gitlab.com/avassa-public/os-upgrade \
--build-arg VSN=<VERSION> \
.
Push image to Control Tower
docker push registry.environment.name.avassa.net/avassa/os-upgrade:<VERSION>
note

An alternative to baking the role-id into the image is to edit the approle to set weak-secret-id: true in the approle YAML. This parameter removes the requirement for the secret part built into the image so the container images published by Avassa may be used directly without building your own image.

This is not recommended as the OS Upgrade workers have access to the operating system, so it's important to make sure you're running a validated image.

Building a multi-architecture image

If your fleet mixes architectures — for example x86_64 servers and ARM64 edge devices (Raspberry Pi 4/5, NVIDIA Jetson, AWS Graviton, Ampere) — you need a single image tag whose manifest list resolves to the correct architecture for each host. Avassa publishes the upstream registry.gitlab.com/avassa-public/os-upgrade image as a multi-arch manifest (linux/amd64 and linux/arm64), so you only need to preserve that property when re-tagging it with your approle.

  1. Create a buildx builder that supports multi-platform builds (the default docker driver does not). This is a one-time setup per machine:

    docker buildx create --name multiarch --driver docker-container --bootstrap --use

    Verify it is running and lists both architectures:

    docker buildx ls

    The multiarch builder should appear with linux/amd64 and linux/arm64 under PLATFORMS.

  2. Build and push for both architectures in a single command. Multi-platform builds must use --push (or --output) — they cannot be loaded into the local docker image store:

    Build and push multi-arch image
    docker buildx build \
    --platform linux/amd64,linux/arm64 \
    -t registry.environment.name.avassa.net/avassa/os-upgrade:<VERSION> \
    --secret id=approle,src=role-id \
    --build-arg IMAGE=registry.gitlab.com/avassa-public/os-upgrade \
    --build-arg VSN=<VERSION> \
    --push \
    .
  3. Verify the pushed image is a manifest list with both architectures:

    docker buildx imagetools inspect \
    registry.environment.name.avassa.net/avassa/os-upgrade:<VERSION>

    You should see two Manifests: entries, one for linux/amd64 and one for linux/arm64.

note

If the build fails with Multi-platform build is not supported for the docker driver, the default builder is still selected. Run docker buildx use multiarch and try again.

With a multi-arch image in place, the same application specification deploys to both x86 and ARM hosts — the container runtime on each host pulls the matching architecture automatically. No per-arch site labels or separate application specs are needed.

Create the application specification

By default the application deployment matches the os-type site label to determine the sites where the application must be deployed. Make sure to either label the sites correspondingly or to modify the application deployment to indicate the sites the worker application should be deployed to. The expected label values are:

  • os-type = debian — Debian/Ubuntu hosts
  • os-type = dnf — Rocky Linux (and other DNF-based distributions)
  • os-type = coreos — Fedora CoreOS hosts
Debian/Ubuntu
name: os-upgrade-debian
version: "<VERSION>"
services:
- name: worker
mode: one-per-matching-host
containers:
- name: main
image: avassa/os-upgrade:<VERSION>
cmd: [ "os_upgrade.debian" ]
approle: os-upgrade
container-layer-size: 0B
env:
API_CA_CERT: ${SYS_API_CA_CERT}
HOST: ${SYS_HOST}
APPROLE_SECRET_ID: ${SYS_APPROLE_SECRET_ID}
LOG_LEVEL: INFO
mounts:
- volume-name: systemd-socket
mount-path: /var/run/dbus/system_bus_socket
- volume-name: apt-conf-d
mount-path: /data/apt.conf.d
user-namespace:
host: true
security:
apparmor:
disabled: true ## apparmor prevents systemd socket access
volumes:
- name: systemd-socket
system-volume:
reference: systemd-socket
- name: apt-conf-d
config-map:
items:
- name: 90avassa
data-verbatim: |
// the reboot is handled by the os-upgrade handler
Unattended-Upgrade::Automatic-Reboot "false";
APT::Periodic::Update-Package-Lists "always";
APT::Periodic::Unattended-Upgrade "always";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";
automating using sed
sed -e 's|%VERSION%|2026.5.0|' \
-e 's|%IMAGE%|avassa/os-upgrade:2026.5.0|' \
yaml/application.debian.yaml.in \
> application.debian.yaml
Rocky Linux

The Rocky Linux worker integrates with dnf-automatic and uses a SELinux exemption rather than the AppArmor one. A custom automatic.conf can be supplied through the config-map below.

name: os-upgrade-dnf
version: "<VERSION>"
services:
- name: worker
mode: one-per-matching-host
containers:
- name: main
image: avassa/os-upgrade:<VERSION>
cmd: [ "os_upgrade.dnf" ]
approle: os-upgrade
container-layer-size: 0B
env:
API_CA_CERT: ${SYS_API_CA_CERT}
HOST: ${SYS_HOST}
APPROLE_SECRET_ID: ${SYS_APPROLE_SECRET_ID}
LOG_LEVEL: INFO
mounts:
- volume-name: systemd-socket
mount-path: /var/run/dbus/system_bus_socket
- volume-name: dnf-automatic-conf
files:
- name: automatic.conf
mount-path: /data/automatic.conf
user-namespace:
host: true
security:
selinux:
disabled: true ## SELinux prevents systemd socket access
volumes:
- name: systemd-socket
system-volume:
reference: systemd-socket
- name: dnf-automatic-conf
config-map:
items:
- name: automatic.conf
data-verbatim: |
[commands]
upgrade_type = security
[emitters]
emit_via = stdio
[base]
debuglevel = 1
Fedora CoreOS

Fedora CoreOS uses A/B image-based upgrades coordinated through zincati's fleet-lock mechanism, so the worker runs in host networking mode and does not need a systemd socket mount.

name: os-upgrade-coreos
version: "<VERSION>"
services:
- name: worker
mode: one-per-matching-host
network:
host: true
containers:
- name: main
image: avassa/os-upgrade:<VERSION>
cmd: [ "os_upgrade.coreos" ]
approle: os-upgrade
container-layer-size: 0B
user-namespace:
host: true
env:
AVASSA_API: https://localhost:4646
API_CA_CERT: ${SYS_API_CA_CERT}
HOST: ${SYS_HOST}
APPROLE_SECRET_ID: ${SYS_APPROLE_SECRET_ID}
LOG_LEVEL: INFO

See the yaml/ directory of the os-upgrade repository for the matching .in templates and ready-to-use deployment YAMLs.

Deploy and designate the worker application

With the above application specification you can now deploy that to sites where you have Debian running. Assuming you have labelled those with os-type you can use a deployment shown below:

name: os-upgrade-debian
application: os-upgrade-debian
placement:
match-site-labels: os-type = debian
note

Only the site provider tenant can run an OS upgrade worker application.

In order for the Avassa system to know that this application handles OS upgrades, the tenant deploying the application designates it as such with the following configuration. Note that the name here refers to the application name, not necessarily the deployment name:

OS Upgrade Configuration
supctl create os-upgrade <<EOF
worker-applications:
- name: os-upgrade-debian
EOF

With this configuration in place the system knows that the OS upgrades should run on any site that has the os-upgrade-debian application deployed to at least one host. Multiple applications could be specified in the os-upgrade object, in which case the OS upgrades run where any of these applications deployed.

OS upgrade windows configuration

Now, everything is in place to configure an OS upgrade window that will control when the OS is upgraded:

supctl create system site-profiles <<EOF
name: sweden
os-upgrade-windows:
- days-of-week: Friday, Saturday
start-time: 01:00
timezone: site-local
duration: 4h
EOF

Assign this profile to relevant sites (note that only one site-profile can be assigned to a site, so if a site already has a site-profile, then the profile itself may need to be updated):

supctl merge system sites stockholm-sergel <<EOF
site-profile: sweden
EOF

This configuration tells the system that the OS upgrades should run on any site that has the os-upgrade-debian application deployed to at least one host and the upgrades should be initiated each week on Friday and Saturday, at 01:00 local time and must not exceed 4 hours. The controller assumes that all hosts running a service instance that belongs to os-upgrade-debian application will receive the commands as a part of the OS upgrade process.

Inspect the OS upgrade status on a specific site

In order to inspect whether the OS upgrade mechanism is configured and running as expected on site site-name use the following command:

supctl show --site site-name os-upgrade

Example output:

worker-applications:
- name: os-upgrade-debian
- name: os-upgrade-rhel
os-upgrade-windows:
- days-of-week: Friday, Saturday
start-time: 01:00
timezone: site-local
duration: 4h
status: idle
next-upgrade-in: 1d4h18s
scheduled-workers:
- host: host01
application: os-upgrade-debian
- host: host02
application: os-upgrade-debian
- host: host03
application: os-upgrade-debian

os-upgrade-site

This tells us that the OS upgrade is currently idle (no upgrade in progress). The next OS upgrade is scheduled to start in 1 day 4 hours and 18 seconds. If the upgrade was to start now, then hosts host01, host02 and host03 would be included in the upgrade, because each of them is running an instance of a service from os-upgrade-debian worker application. If a host is not mentioned in this list, it is assumed that the OS upgrades for this host are externally managed, so an upgrade would still succeed even if not all hosts on the site are managed.

A different example output shows the upgrade in progress:

worker-applications:
- name: os-upgrade-debian
- name: os-upgrade-rhel
os-upgrade-windows:
- days-of-week: Friday, Saturday
start-time: 01:00
timezone: site-local
duration: 4h
status: in-progress
scheduled-workers:
- host: host01
application: os-upgrade-debian
- host: host02
application: os-upgrade-debian
- host: host03
application: os-upgrade-debian
last-upgrade-info:
start-time: 2023-04-20T01:00:00Z
timeout-in: 3h40m17s
hosts:
- hostname: host01
status: scheduled
- hostname: host02
status: prepared
- hostname: host03
status: upgrading

This example tells us that the upgrade is currently ongoing. It has started at the specified start-time and is expected to time out if not completed within the next 3 hours 40 minutes and 17 seconds (from the time the output was generated). Three hosts are a part of the upgrade: host01 has not replied to the prepare command yet, host02 has completed the prepare phase and is awaiting its turn to upgrade and host03 has completed the prepare phase and has been issued the upgrade command which it has not yet replied to, so the upgrade is ongoing on this host.

When the upgrade is completed (or aborted due to timeout or failure), the last-upgrade-info shows the status of the latest upgrade.

Inspect the software versions as reported by the worker applications

The worker application may detect the version of the OS or the versions of the packages running on the host. They are reported each time the worker notices the change and are stored by the controller. To inspect the latest versions reported by the workers on a specific site:

supctl show --site site-name os-upgrade hosts

Example output:

- hostname: host01
timestamp: 2023-04-17T01:12:24Z
versions:
linux-image: 5.15.83-ubuntu0
docker: 20.10.1
- hostname: host02
timestamp: 2023-04-22T10:14:42Z
versions:
linux-image: 5.15.94-ubuntu0
docker: 20.10.4
- hostname: host03
timestamp: 2023-04-22T10:16:33Z
versions:
linux-image: 5.15.94-ubuntu0
docker: 20.10.4
note

The versions is a key-value mapping published by each worker, so the actual packages or other key names and corresponding reported versions are defined by the worker implementation.

E.g. the kernel version can be seen in the UI

Before upgrade

upgrade b4

After upgrade

upgrade after

Troubleshooting

401 Unauthorized when pushing to the Control Tower registry

Run docker login registry.environment.name.avassa.net with credentials that have push access, then retry the push. For multi-arch builds with docker buildx ... --push, the authentication step happens at push time, so the build itself may complete successfully before failing on push.

Multi-platform build is not supported for the docker driver

The currently selected buildx builder uses the default docker driver, which only supports single-platform builds. Switch to a docker-container builder (see Building a multi-architecture image):

docker buildx use multiarch

If no such builder exists yet, create one with:

docker buildx create --name multiarch --driver docker-container --bootstrap --use

First multi-arch build is very slow

When building for an architecture that doesn't match the host CPU, buildx runs the foreign-arch stage under QEMU emulation. The first build pays the emulation cost in full; subsequent builds reuse the layer cache and are much faster. Because the Avassa Dockerfile only re-tags the upstream image and adds the role-id, the per-arch work is small and emulation overhead is limited.

A host is stuck in upgrading status

Inspect the worker logs to see what the worker is doing:

supctl --site site-name logs applications os-upgrade-debian

The upgrade has a hard timeout driven by the OS upgrade window's duration — if the host never replies, the controller aborts the upgrade when that timer expires and the site returns to idle. Common causes:

  • The host rebooted during the upgrade phase (expected for kernel upgrades). The worker should report success on the next start; if it doesn't, check that the worker application is still deployed and running.
  • The worker container crashed. Check supctl show applications os-upgrade-debian for restart counts.
  • For Fedora CoreOS specifically: the worker times out waiting for a fleet-lock request. Verify that /etc/zincati/config.d/51-fleet-update.toml is in place and points at http://127.0.0.1:8000/fleet_lock/.

The OS upgrade never starts at the scheduled time

Check that the site actually has the worker application deployed and that the site has a site-profile with os-upgrade-windows assigned:

supctl show --site site-name os-upgrade

If scheduled-workers is empty, the worker application is not deployed to any host on the site, or the host labels don't match the deployment's match-site-labels expression.