Manual OS Upgrade

This document describes what steps should be done when manually updating the OS on the hosts where Edge Enforcers are running. For documentation on the feature that helps automate this process please see the Automatic OS Upgrades document.

The hosts in a site should be upgraded one at a time, preconditions and postconditions should be checked on each host, before and after the OS upgrade.

NOTE If docker or containerd is upgraded, the Edge Enforcer will typically be restarted, the restart should be handled automatically by the docker upgrade.

If anything goes wrong during the process, please check the Troubleshooting section.

Site Preconditions

Check the site's host status

supctl show --site sthlm system cluster hosts --fields cluster-hostname,hostname,oper-status

- cluster-hostname: sthlm-001
  oper-status: up
  hostname: sthlm-3
- cluster-hostname: sthlm-002
  oper-status: up
  hostname: sthlm-1
- cluster-hostname: sthlm-003
  oper-status: up
  hostname: sthlm-2

This indicates the site's hosts are in a good state.

Host Preconditions

The Edge Enforcer is up and running

systemctl status supd

● supd.service - Avassa Edge Enforcer
     Loaded: loaded (/etc/systemd/system/supd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-08-23 15:11:53 CEST; 1h 9min ago
    Process: 1205504 ExecStartPre=/usr/bin/docker rm supd (code=exited, status=0/SUCCESS)
   Main PID: 1205531 (start-supd)
      Tasks: 8 (limit: 4677)
     Memory: 18.3M
     CGroup: /system.slice/supd.service
             ├─1205531 /bin/sh /usr/sbin/start-supd -D -w -j --
             └─1205694 docker wait supd

Aug 23 15:11:53 sthlm-1 systemd[1]: Starting Avassa Edge Enforcer...
Aug 23 15:11:53 sthlm-1 docker[1205504]: supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Started Avassa Edge Enforcer.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: 8a6f45c4b02ce6787ccdea951481fa22be30288e37ee34ea8403147279cf2c65

Optional: Drain the host from running service instances

If the host needs to be restarted after the OS upgrade, or dockerd or containerd are upgraded, running service instances will be stopped. The supd scheduler in the site will not restart these service instances on other hosts unless the host is down for some time. (See Scheduler for details).

However, it is possible to drain the host from running service instances. In this case, the service instances will be rescheduled to other hosts, if possible.

First, check if any service instances are running on the host:

supctl show --site sthlm system cluster hosts sthlm-003 service-instances --fields name,oper-status

Then drain the host:

supctl do --site sthlm system cluster drain-host sthlm-003

The API endpoint describes how to invoke this using the APIs.

This command will block and return when all service instances on the host are stopped. It is possible to specify a timeout as well:

supctl do --site sthlm system cluster drain-host sthlm-003 --timeout 5m

When a timeout is given, the command may return:

result: timeout

This means that there are still some service instances running on the host, possibly because they use the delayed-shutdown feature, which is used to ensure that the service instance is given time to complete ongoing tasks.

OS Upgrade

Please perform the OS upgrade according to the documentation for the Linux distribution.

If the host needs to be rebooted, please do that before checking the postconditions.

Host Postconditions

The Edge Enforcer is up and running

systemctl status supd

● supd.service - Avassa Edge Enforcer
     Loaded: loaded (/etc/systemd/system/supd.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-08-23 15:11:53 CEST; 1h 9min ago
    Process: 1205504 ExecStartPre=/usr/bin/docker rm supd (code=exited, status=0/SUCCESS)
   Main PID: 1205531 (start-supd)
      Tasks: 8 (limit: 4677)
     Memory: 18.3M
     CGroup: /system.slice/supd.service
             ├─1205531 /bin/sh /usr/sbin/start-supd -D -w -j --
             └─1205694 docker wait supd

Aug 23 15:11:53 sthlm-1 systemd[1]: Starting Avassa Edge Enforcer...
Aug 23 15:11:53 sthlm-1 docker[1205504]: supd
Aug 23 15:11:53 sthlm-1 systemd[1]: Started Avassa Edge Enforcer.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
Aug 23 15:11:53 sthlm-1 start-supd[1205620]: 8a6f45c4b02ce6787ccdea951481fa22be30288e37ee34ea8403147279cf2c65

Check the Edge Enforcer is healthy

curl -i -k https://localhost:4646/healthz

HTTP/1.1 200 OK
content-length: 31
content-type: application/json
server: Cowboy

{
  "oper-status": "running"
}

Any 2xx return code indicates success, more details are in the payload.

Site Postconditions

Check the site's host status

supctl show --site sthlm system cluster hosts --fields cluster-hostname,hostname,oper-status

- cluster-hostname: sthlm-001
  oper-status: up
  hostname: sthlm-3
- cluster-hostname: sthlm-002
  oper-status: up
  hostname: sthlm-1
- cluster-hostname: sthlm-003
  oper-status: up
  hostname: sthlm-2

This indicates the site's hosts are in a good state after the OS upgrade.

Rescheduling of service instances

Since we are upgrading and restarting all hosts one by one this procedure will effect the distribution of running service instances among the running hosts. Ie. not being well balanced between hosts anymore after the upgrade.

When this is the case it is possible to invoke the reschedule action to re-balance the service instances among the running hosts within a site.

supctl do --site sthml system cluster reschedule

rescheduled-service-instances:
  - name: telco.alpine.fifth-srv-4
    from-host: h05
    to-host: h07
  - name: telco.alpine.fifth-srv-3
    from-host: h04
    to-host: h06
  - name: telco.alpine.fifth-srv-2
    from-host: h03
    to-host: h07
  - name: telco.alpine.first-srv-1
    from-host: h06
    to-host: h07
  - name: telco.alpine.fifth-srv-1
    from-host: h06
    to-host: h07

Troubleshooting

To verify that the Edge Enforcer is running, please check

docker ps

In this list of running containers, supd should be one:

CONTAINER ID   IMAGE           COMMAND                  CREATED        STATUS        PORTS     NAMES
63954d024dca   682a7e85b95a    "/docker-entrypoint.…"   12 hours ago   Up 12 hours             supd
6ee4c3e3257f   3e42dd4e79c7    "docker-entrypoint.s…"   7 days ago     Up 7 days               example-corp.ha-redis.redis-1.redis-sentinel
fc668ecd1e4a   3e42dd4e79c7    "docker-entrypoint.s…"   7 days ago     Up 7 days               example-corp.ha-redis.redis-1.redis
946b9b917453   avassa/nsctrl   "/pause"                 7 days ago     Up 7 days               example-corp.ha-redis.redis-1.nsctrl
d380aad66702   dd86a78b4c78    "/bin/sh -c $EXECUTA…"   2 weeks ago    Up 2 weeks              example-corp.popcorn-controller.popcorn-controller-service-1.kettle-popper-manager
15a8e43e1483   avassa/nsctrl   "/pause"                 2 weeks ago    Up 2 weeks              example-corp.popcorn-controller.popcorn-controller-service-1.nsctrl

If supd is not in that list, it means that systemd hasn't been able to start the Edge Enforcer.

The check the systemd status for information on why the Edge Enforcer doesn't start:

systemctl status supd

and

journalctl -xu supd

If supd is running, the logs can be inspected using docker

docker logs supd

Please examine the logs for indications on why the site cluster cannot be established.

Site Preconditions​

Host Preconditions​

Optional: Drain the host from running service instances​

OS Upgrade​

Host Postconditions​

Site Postconditions​

Rescheduling of service instances​

Troubleshooting​