Automated OS upgrades
This tutorial describes how to configure the system upgrade mechanism to trigger the OS upgrade at a designated day and time of the week and orchestrate the upgrade among different hosts. In this tutorial we use a Ubuntu/Debian site as an example. Avassa provides OS upgrade agents for a number of distributions. For the latest list see the readme in https://gitlab.com/avassa-public/os-upgrade.
A process to perform OS upgrades by hand can be found in Manual OS upgrade document.
In order to get an understanding of the OS upgrade architecture and mechanisms, read our OS upgrade fundamentals.
The overall procedure to use the Avassa OS upgrade functions are:
- build the OS upgrade application for your system, we provide different upgrade applications for different OSes.
- deploy the OS upgrade application to corresponding hosts
- configure OS upgrade windows
- monitor the automatic OS upgrades
Building the OS upgrade application implemented by Avassa
In order to build the application for Debian/Ubuntu:
-
Check out the git repository https://gitlab.com/avassa-public/os-upgrade or download the files mentioned below as needed.
-
Create an approle and a corresponding approle policy. An approle is the authentication mechanism used by the applications to be able to use the Avassa APIs locally on the host they are running on.
cat yaml/approle.yaml | supctl create strongbox authentication approles
cat yaml/approle-policy.yaml | supctl create policy policies -
By default an approle authentication mechanism requires two secret parts: one part built into the image and one part is provided at runtime. The secret part to build into the image is called
role-id
and is generated when the approle is created. To look up therole-id
run:supctl show strongbox authentication approles os-upgrade --fields role-id
Example outputrole-id: 4c6eb93b-4c3e-43ff-884c-273ff920ac24
This value (without the
role-id:
prefix) may then be used to build a container image using theDockerfile
in the repository above as follows. Make sure to replace the<role-id>
in the first line with the actual role-id obtained from the previous command.Also replace the
VERSION
value with the target version to build.latest
is fine for development purpose, but locking the application specification to a specific version enables more controlled application upgrades.The list of published versions can be found at https://gitlab.com/avassa-public/os-upgrade/container_registry.
Store role-id for docker buildecho '<replace me with role-id from above>' > role-id
Build image with role-iddocker build \
-t registry.environment.name.avassa.net/avassa/os-upgrade:<VERSION> \
--secret id=approle,src=role-id \
--build-arg IMAGE=registry.gitlab.com/avassa-public/os-upgrade \
--build-arg VSN=<VERSION> \
.docker push registry.environment.name.avassa.net/avassa/os-upgrade:<VERSION>
noteAn alternative to this step is to edit the approle to set a fixed role id (developing-with-strongbox-edge#fixed-role-id). This parameter removes the requirement for the secret part built into the image so the container images published by Avassa may be used directly without building an own image.
This is not recommended as the OS Upgrade workers have access to the operating system, it's important to make sure you're running a validated image.
By default the application deployment matches label
os-type
with valuedebian
orrhel
to determine the sites where the application must be deployed to. Make sure to either label the sites correspondingly or to modify the application deployment to indicate the sites the worker application should be deployed to.Create the application specification:
name: os-upgrade-debian
version: "<VERSION>"
services:
- name: worker
mode: one-per-matching-host
containers:
- name: main
image: avassa/os-upgrade:<VERSION>
cmd: [ "os_upgrade.debian" ]
approle: os-upgrade
container-layer-size: 0B
env:
API_CA_CERT: ${SYS_API_CA_CERT}
HOST: ${SYS_HOST}
APPROLE_SECRET_ID: ${SYS_APPROLE_SECRET_ID}
LOG_LEVEL: INFO
mounts:
- volume-name: systemd-socket
mount-path: /var/run/dbus/system_bus_socket
- volume-name: apt-conf-d
mount-path: /data/apt.conf.d
user-namespace:
host: true
security:
apparmor:
disabled: true ## apparmor prevents systemd socket access
volumes:
- name: systemd-socket
system-volume:
reference: systemd-socket
- name: apt-conf-d
config-map:
items:
- name: 90avassa
data-verbatim: |
// the reboot is handled by the os-upgrade handler
Unattended-Upgrade::Automatic-Reboot "false";
APT::Periodic::Update-Package-Lists "always";
APT::Periodic::Unattended-Upgrade "always";
Unattended-Upgrade::Remove-Unused-Kernel-Packages "true";
Unattended-Upgrade::Remove-New-Unused-Dependencies "true";automating using sedsed -e 's|%VERSION%|2024.4.0|' \
-e 's|%IMAGE%|avassa/os-upgrade:2024.4.0|' \
yaml/application.debian.yaml.in \
> application.debian.yaml
Deploy the application to corresponding hosts
With the above application specification you can now deploy that to sites where you have Debian running.
Assume you have labelled those with os-type
you can use a deployment shown below:
name: os-upgrade-debian
application: os-upgrade-debian
placement:
match-site-labels: os-type = debian
OS upgrade configuration
Now, everything is in place to configure a maintenance window that will control when the OS is upgraded. This configuration must be created by a site provider tenant.
supctl create os-upgrade <<EOF
worker-applications:
- name: os-upgrade-debian
maintenance-windows:
- days-of-week: Friday, Saturday
start-time: 01:00
timezone: site-local
duration: 4h
EOF
This configuration tells the system that the OS upgrades should run on any site
that has the os-upgrade-debian
application deployed to at least one host and the
upgrades should be initiated each week on Friday and Saturday, at 01:00 local
time and must not exceed 4 hours. The controller assumes that all hosts running
a service instance that belongs to os-upgrade-debian
application will receive
the commands as a part of the OS upgrade process.
Inspect the OS upgrade status on a specific site
In order to inspect whether the OS upgrade mechanism is configured and running
as expected on site site-name
use the following command:
supctl show -s site-name os-upgrade
Example output:
worker-applications:
- name: os-upgrade-debian
- name: os-upgrade-rhel
maintenance-windows:
- days-of-week: Friday, Saturday
start-time: 01:00
timezone: site-local
duration: 4h
status: idle
next-upgrade-in: 1d4h18s
scheduled-workers:
- host: host01
application: os-upgrade-debian
- host: host02
application: os-upgrade-debian
- host: host03
application: os-upgrade-debian
This tells us that the OS upgrade is currently idle (no upgrade in progress).
The next OS upgrade is scheduled to start in 1 day 4 hours and 18 seconds. If
the upgrade was to start now, then hosts host01
, host02
and host03
would
be included in the upgrade, because each of them is running an instance of
a service from os-upgrade-debian
worker application. If a host is not
mentioned in this list, it is assumed that the OS upgrades for this host are
externally managed, so an upgrade would still succeed even if not all hosts on
the site are managed.
A different example output shows the upgrade in progress:
worker-applications:
- name: os-upgrade-debian
- name: os-upgrade-rhel
maintenance-windows:
- days-of-week: Friday, Saturday
start-time: 01:00
timezone: site-local
duration: 4h
status: in-progress
scheduled-workers:
- host: host01
application: os-upgrade-debian
- host: host02
application: os-upgrade-debian
- host: host03
application: os-upgrade-debian
last-upgrade-info:
start-time: 2023-04-20T01:00:00Z
timeout-in: 3h40m17s
hosts:
- hostname: host01
status: scheduled
- hostname: host02
status: prepared
- hostname: host03
status: upgrading
This example tells us that the upgrade is currently ongoing. It has started at
the specified start-time
and is expected to time out if not completed within
the next 3 hours 40 minutes and 17 seconds (from the time the output was
generated). Three hosts are a part of the upgrade: host01
has not replied to
the prepare
command yet, host02
has completed the prepare
phase and is
awaiting its turn to upgrade and host03
has completed the prepare
phase and
has been issued the upgrade
command which it has not yet replied to, so the
upgrade is ongoing on this host.
When the upgrade is completed (or aborted due to timeout or failure), the
last-upgrade-info
shows the status of the latest upgrade.
Inspect the software versions as reported by the worker applications
The worker application may detect the version of the OS or the versions of the packages running on the host. They are reported each time the worker notices the change and are stored by the controller. To inspect the latest versions reported by the workers on a specific site:
supctl show -s site-name os-upgrade hosts
Example output:
- hostname: host01
timestamp: 2023-04-17T01:12:24Z
versions:
linux-image: 5.15.83-ubuntu0
docker: 20.10.1
- hostname: host02
timestamp: 2023-04-22T10:14:42Z
versions:
linux-image: 5.15.94-ubuntu0
docker: 20.10.4
- hostname: host03
timestamp: 2023-04-22T10:16:33Z
versions:
linux-image: 5.15.94-ubuntu0
docker: 20.10.4
The versions
is a key-value mapping published by each worker, so the actual
packages or other key names and corresponding reported versions are defined by
the worker implementation.
E.g. the kernel version can be seen in the UI