Skip to main content

Manual OS Upgrade

This document describes what steps should be done when manually updating the OS on the hosts where Edge Enforcers are running. For documentation on the feature that helps automate this process please see the Automatic OS Upgrades document.

The hosts in a site should be upgraded one at a time, preconditions and postconditions should be checked on each host, before and after the OS upgrade.

NOTE If docker or containerd is upgraded, the Edge Enforcer will typically be restarted, the restart should be handled automatically by the docker upgrade.

If anything goes wrong during the process, please check the Troubleshooting section.

Site Preconditions

Check the site's host status

supctl show --site sthlm system cluster hosts --fields cluster-hostname,hostname,oper-status
- cluster-hostname: sthlm-001
oper-status: up
hostname: sthlm-3
- cluster-hostname: sthlm-002
oper-status: up
hostname: sthlm-1
- cluster-hostname: sthlm-003
oper-status: up
hostname: sthlm-2

This indicates the site's hosts are in a good state.

Host Preconditions

The Edge Enforcer is up and running

systemctl status supd
● supd.service - Avassa Edge Enforcer
Loaded: loaded (/etc/systemd/system/supd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-08-23 15:11:53 CEST; 1h 9min ago
Process: 1205504 ExecStartPre=/usr/bin/docker rm supd (code=exited, status=0/SUCCESS)
Main PID: 1205531 (start-supd)
Tasks: 8 (limit: 4677)
Memory: 18.3M
CGroup: /system.slice/supd.service
├─1205531 /bin/sh /usr/sbin/start-supd -D -w -j --
└─1205694 docker wait supd

Aug 23 15:11:53 sthlm-1 systemd[1]: Starting Avassa Edge Enforcer...
Aug 23 15:11:53 sthlm-1 docker[1205504]: supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Started Avassa Edge Enforcer.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: 8a6f45c4b02ce6787ccdea951481fa22be30288e37ee34ea8403147279cf2c65

Optional: Drain the host from running service instances

If the host needs to be restarted after the OS upgrade, or dockerd or containerd are upgraded, running service instances will be stopped. The supd scheduler in the site will not restart these service instances on other hosts unless the host is down for some time. (See Scheduler for details).

However, it is possible to drain the host from running service instances. In this case, the service instances will be rescheduled to other hosts, if possible.

First, check if any service instances are running on the host:

supctl show --site sthlm system cluster hosts sthlm-003 service-instances --fields name,oper-status

Then drain the host:

supctl do --site sthlm system cluster drain-host sthlm-003

The API endpoint describes how to invoke this using the APIs.

This command will block and return when all service instances on the host are stopped. It is possible to specify a timeout as well:

supctl do --site sthlm system cluster drain-host sthlm-003 --timeout 5m

When a timeout is given, the command may return:

result: timeout

This means that there are still some service instances running on the host, possibly because they use the delayed-shutdown feature, which is used to ensure that the service instance is given time to complete ongoing tasks.

OS Upgrade

Please perform the OS upgrade according to the documentation for the Linux distribution.

If the host needs to be rebooted, please do that before checking the postconditions.

Host Postconditions

The Edge Enforcer is up and running

systemctl status supd
● supd.service - Avassa Edge Enforcer
Loaded: loaded (/etc/systemd/system/supd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-08-23 15:11:53 CEST; 1h 9min ago
Process: 1205504 ExecStartPre=/usr/bin/docker rm supd (code=exited, status=0/SUCCESS)
Main PID: 1205531 (start-supd)
Tasks: 8 (limit: 4677)
Memory: 18.3M
CGroup: /system.slice/supd.service
├─1205531 /bin/sh /usr/sbin/start-supd -D -w -j --
└─1205694 docker wait supd

Aug 23 15:11:53 sthlm-1 systemd[1]: Starting Avassa Edge Enforcer...
Aug 23 15:11:53 sthlm-1 docker[1205504]: supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Started Avassa Edge Enforcer.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: 8a6f45c4b02ce6787ccdea951481fa22be30288e37ee34ea8403147279cf2c65

Check the Edge Enforcer is healthy

curl -i -k https://localhost:4646/healthz
HTTP/1.1 200 OK
content-length: 31
content-type: application/json
server: Cowboy

{
"oper-status": "running"
}

Any 2xx return code indicates success, more details are in the payload.

Site Postconditions

Check the site's host status

supctl show --site sthlm system cluster hosts --fields cluster-hostname,hostname,oper-status
- cluster-hostname: sthlm-001
oper-status: up
hostname: sthlm-3
- cluster-hostname: sthlm-002
oper-status: up
hostname: sthlm-1
- cluster-hostname: sthlm-003
oper-status: up
hostname: sthlm-2

This indicates the site's hosts are in a good state after the OS upgrade.

Rescheduling of service instances

Since we are upgrading and restarting all hosts one by one this procedure will effect the distribution of running service instances among the running hosts. Ie. not being well balanced between hosts anymore after the upgrade.

When this is the case it is possible to invoke the reschedule action to re-balance the service instances among the running hosts within a site.

supctl do --site sthml system cluster reschedule
rescheduled-service-instances:
- name: telco.alpine.fifth-srv-4
from-host: h05
to-host: h07
- name: telco.alpine.fifth-srv-3
from-host: h04
to-host: h06
- name: telco.alpine.fifth-srv-2
from-host: h03
to-host: h07
- name: telco.alpine.first-srv-1
from-host: h06
to-host: h07
- name: telco.alpine.fifth-srv-1
from-host: h06
to-host: h07

Troubleshooting

To verify that the Edge Enforcer is running, please check

docker ps

In this list of running containers, supd should be one:

CONTAINER ID   IMAGE           COMMAND                  CREATED        STATUS        PORTS     NAMES
63954d024dca 682a7e85b95a "/docker-entrypoint.…" 12 hours ago Up 12 hours supd
6ee4c3e3257f 3e42dd4e79c7 "docker-entrypoint.s…" 7 days ago Up 7 days example-corp.ha-redis.redis-1.redis-sentinel
fc668ecd1e4a 3e42dd4e79c7 "docker-entrypoint.s…" 7 days ago Up 7 days example-corp.ha-redis.redis-1.redis
946b9b917453 avassa/nsctrl "/pause" 7 days ago Up 7 days example-corp.ha-redis.redis-1.nsctrl
d380aad66702 dd86a78b4c78 "/bin/sh -c $EXECUTA…" 2 weeks ago Up 2 weeks example-corp.popcorn-controller.popcorn-controller-service-1.kettle-popper-manager
15a8e43e1483 avassa/nsctrl "/pause" 2 weeks ago Up 2 weeks example-corp.popcorn-controller.popcorn-controller-service-1.nsctrl

If supd is not in that list, it means that systemd hasn't been able to start the Edge Enforcer.

The check the systemd status for information on why the Edge Enforcer doesn't start:

systemctl status supd

and

journalctl -xu supd

If supd is running, the logs can be inspected using docker

docker logs supd

Please examine the logs for indications on why the site cluster cannot be established.