Manual OS Upgrade
This document describes what steps should be done when manually updating the OS on the hosts where Edge Enforcers are running. For documentation on the feature that helps automate this process please see the Automatic OS Upgrades document.
The hosts in a site should be upgraded one at a time, preconditions and postconditions should be checked on each host, before and after the OS upgrade.
NOTE If docker
or containerd
is upgraded, the Edge Enforcer will typically be restarted, the restart should be handled automatically by the docker upgrade.
If anything goes wrong during the process, please check the Troubleshooting section.
Site Preconditions
Check the site's host status
supctl show --site sthlm system cluster hosts --fields cluster-hostname,hostname,oper-status
- cluster-hostname: sthlm-001
oper-status: up
hostname: sthlm-3
- cluster-hostname: sthlm-002
oper-status: up
hostname: sthlm-1
- cluster-hostname: sthlm-003
oper-status: up
hostname: sthlm-2
This indicates the site's hosts are in a good state.
Host Preconditions
The Edge Enforcer is up and running
systemctl status supd
● supd.service - Avassa Edge Enforcer
Loaded: loaded (/etc/systemd/system/supd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-08-23 15:11:53 CEST; 1h 9min ago
Process: 1205504 ExecStartPre=/usr/bin/docker rm supd (code=exited, status=0/SUCCESS)
Main PID: 1205531 (start-supd)
Tasks: 8 (limit: 4677)
Memory: 18.3M
CGroup: /system.slice/supd.service
├─1205531 /bin/sh /usr/sbin/start-supd -D -w -j --
└─1205694 docker wait supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Starting Avassa Edge Enforcer...
Aug 23 15:11:53 sthlm-1 docker[1205504]: supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Started Avassa Edge Enforcer.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: 8a6f45c4b02ce6787ccdea951481fa22be30288e37ee34ea8403147279cf2c65
Optional: Drain the host from running service instances
If the host needs to be restarted after the OS upgrade, or dockerd
or
containerd
are upgraded, running service instances will be stopped.
The supd scheduler in the site will not restart these service
instances on other hosts unless the host is down for some time. (See
Scheduler for details).
However, it is possible to drain the host from running service instances. In this case, the service instances will be rescheduled to other hosts, if possible.
First, check if any service instances are running on the host:
supctl show --site sthlm system cluster hosts sthlm-003 service-instances --fields name,oper-status
Then drain the host:
supctl do --site sthlm system cluster drain-host sthlm-003
The API endpoint describes how to invoke this using the APIs.
This command will block and return when all service instances on the host are stopped. It is possible to specify a timeout as well:
supctl do --site sthlm system cluster drain-host sthlm-003 --timeout 5m
When a timeout is given, the command may return:
result: timeout
This means that there are still some service instances running on the
host, possibly because they use the delayed-shutdown
feature, which
is used to ensure that the service instance is given time to complete
ongoing tasks.
OS Upgrade
Please perform the OS upgrade according to the documentation for the Linux distribution.
If the host needs to be rebooted, please do that before checking the postconditions.
Host Postconditions
The Edge Enforcer is up and running
systemctl status supd
● supd.service - Avassa Edge Enforcer
Loaded: loaded (/etc/systemd/system/supd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2022-08-23 15:11:53 CEST; 1h 9min ago
Process: 1205504 ExecStartPre=/usr/bin/docker rm supd (code=exited, status=0/SUCCESS)
Main PID: 1205531 (start-supd)
Tasks: 8 (limit: 4677)
Memory: 18.3M
CGroup: /system.slice/supd.service
├─1205531 /bin/sh /usr/sbin/start-supd -D -w -j --
└─1205694 docker wait supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Starting Avassa Edge Enforcer...
Aug 23 15:11:53 sthlm-1 docker[1205504]: supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Started Avassa Edge Enforcer.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: 8a6f45c4b02ce6787ccdea951481fa22be30288e37ee34ea8403147279cf2c65
Check the Edge Enforcer is healthy
curl -i -k https://localhost:4646/healthz
HTTP/1.1 200 OK
content-length: 31
content-type: application/json
server: Cowboy
{
"oper-status": "running"
}
Any 2xx return code indicates success, more details are in the payload.
Site Postconditions
Check the site's host status
supctl show --site sthlm system cluster hosts --fields cluster-hostname,hostname,oper-status
- cluster-hostname: sthlm-001
oper-status: up
hostname: sthlm-3
- cluster-hostname: sthlm-002
oper-status: up
hostname: sthlm-1
- cluster-hostname: sthlm-003
oper-status: up
hostname: sthlm-2
This indicates the site's hosts are in a good state after the OS upgrade.
Rescheduling of service instances
Since we are upgrading and restarting all hosts one by one this procedure will effect the distribution of running service instances among the running hosts. Ie. not being well balanced between hosts anymore after the upgrade.
When this is the case it is possible to invoke the reschedule
action to
re-balance the service instances among the running hosts within a site.
supctl do --site sthml system cluster reschedule
rescheduled-service-instances:
- name: telco.alpine.fifth-srv-4
from-host: h05
to-host: h07
- name: telco.alpine.fifth-srv-3
from-host: h04
to-host: h06
- name: telco.alpine.fifth-srv-2
from-host: h03
to-host: h07
- name: telco.alpine.first-srv-1
from-host: h06
to-host: h07
- name: telco.alpine.fifth-srv-1
from-host: h06
to-host: h07
Troubleshooting
To verify that the Edge Enforcer is running, please check
docker ps
In this list of running containers, supd
should be one:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
63954d024dca 682a7e85b95a "/docker-entrypoint.…" 12 hours ago Up 12 hours supd
6ee4c3e3257f 3e42dd4e79c7 "docker-entrypoint.s…" 7 days ago Up 7 days example-corp.ha-redis.redis-1.redis-sentinel
fc668ecd1e4a 3e42dd4e79c7 "docker-entrypoint.s…" 7 days ago Up 7 days example-corp.ha-redis.redis-1.redis
946b9b917453 avassa/nsctrl "/pause" 7 days ago Up 7 days example-corp.ha-redis.redis-1.nsctrl
d380aad66702 dd86a78b4c78 "/bin/sh -c $EXECUTA…" 2 weeks ago Up 2 weeks example-corp.popcorn-controller.popcorn-controller-service-1.kettle-popper-manager
15a8e43e1483 avassa/nsctrl "/pause" 2 weeks ago Up 2 weeks example-corp.popcorn-controller.popcorn-controller-service-1.nsctrl
If supd is not in that list, it means that systemd
hasn't been able to start the Edge Enforcer.
The check the systemd status for information on why the Edge Enforcer doesn't start:
systemctl status supd
and
journalctl -xu supd
If supd is running, the logs can be inspected using docker
docker logs supd
Please examine the logs for indications on why the site cluster cannot be established.