Planning for disconnected sites
Many edge deployments must handle scenarios where sites are disconnected for varying durations, whether planned or unplanned. Avassa is designed to support intermittent connectivity. The Edge Enforcer operates autonomously, with all required artifacts stored and replicated locally at the site. This ensures that control actions can be executed at the edge, independent of the Control Tower.
A key architectural feature is the loose coupling between Control Tower and Edge Enforcer. Communication is handled via a bidirectional pub/sub bus, allowing messages to be buffered both centrally and at the edge during disconnections. Upon reconnection, these messages are synchronized seamlessly.
The platform also supports deployment and upgrade workflows during disconnected periods, including image housekeeping and configuration of pending upgrades.
This guide helps you configure and plan your edge deployment for relevant disconnected scenarios, including:
- Configuring site disconnected behavior
- Managing upgrade sequences
- Certificate management
- User management
- Monitoring disconnected sites
- Local operations at edge sites
- Local unseal at restart?
Configuring site disconnected behavior
Edge sites may have different connectivity profiles. Some may have stable connections where any disconnect is an incident, while others may experience frequent, expected disconnects.
You can configure each site’s behavior when disconnected using
the when-disconnected
property. Supported values:
treat-as-normal
: Temporary disconnects are acceptable; the site is expected to be connected during normal operation. This is the default behavior.treat-as-expected
: Disconnects are normal and not alerted; the site may be non-operational for periods, but deployments continue.treat-as-error
: Connectivity is required; any disconnect triggers an alert.
When an application, or a new version of an application, is deployed
to a site that is offline, by default the deployment waits for the
site to come back online and actually deploy the application.
However, if a site is known to be offline for a longer period of time,
this behavior is not ideal. By configuring the site's
when-disconnected
property to treat-as-expected
, the deployment
will continue even if the site is offline.
The following table describes what happens when a site is disconnected. It shows if an alert is generated when a site disconnects, and what happens if a new application version is deployed when the site is disconnected.
when-disconnected | Alert | Application deployment status |
---|---|---|
treat-as-normal | no | Deployment remains in deploying state |
treat-as-expected | no | Deployment continues (except for canary releases) |
treat-as-error | yes | Deployment remains in deploying state |
Consider:
- Should disconnected sites trigger alerts?
- Should disconnected sites block deployments?
Monitoring disconnected sites
Sites that remain disconnected for extended periods can be overlooked. Use the Control Tower UI to filter and sort sites by connection status and duration. For troubleshooting, inspect connect/disconnect events in the site view.
You can also analyze the Volga topic
system:events
for site-connected
and site-disconnected
events.
To list disconnected sites using supctl
:
supctl show system sites --where="connection-state/connected='false'" --fields=name
If when-disconnected
is set to treat-as-error
,
alerts will be generated for disconnected sites.
Configuring upgrade sequences
If your deployment pipeline releases updates regularly (e.g., bi-weekly), sites disconnected for extended periods may accumulate multiple pending upgrades. By default, these upgrades are applied sequentially upon reconnection.
You can optimize this process by allowing the deployment to skip versions.
Configure this using the
upgrade-from
field.
Here's an example of a simple application that supports such upgrades:
name: theater-room-manager
version: "2.2"
services:
- name: theater-operations
mode: replicated
replicas: 3
containers:
- name: digital-assets-manager
image: ...
upgrade-from:
- version-regexp: "."
method: per-service
services:
- name: theater-operations
instances-in-parallel: 1
healthy-time: 30s
See also application upgrades.
Best practice: Ensure your application supports upgrades from any previous version. This will make an upgrade after longer disconnect faster since intermediate versions can be skipped
Certificate management
Avassa uses site-local certificates for security. Certificates have a default TTL and are auto-rotated. Sites should not remain disconnected longer than the certificate TTL or the scheduled rotation period.
Set the offline-grace-period
in the system
settings
to match your expected maximum disconnected duration.
Certificate defaults are derived from this value.
Monitor certificate expiration using supctl
or the Control Tower UI, which provides alerts for certificates nearing expiration.
If a site misses certificate renewal due to extended disconnection, Avassa provides a built-in recovery mechanism.
See certificate management for disconnected scenarios for detailed instructions on these topics.
User management
When operating sites via Control Tower, authentication is handled centrally (OIDC or local users in Strongbox). For local operations during outages, ensure site-local authentication is enabled.
Create Strongbox users and distribute them to the relevant sites:
This allows local users to authenticate and use supctl
and APIs even when disconnected.
Edge site local operations
While disconnected, users can perform local operations using supctl
and APIs.
Consider deploying a site-local custom web UI for simplified management,
such as the Site Admin UI example.
Local configuration changes made during disconnection are flagged. Upon reconnection, the central operations team can review and choose to keep local changes or overwrite with central configuration.
Read more: disconnected site operations
Site local unseal
If you have scenarios where sites will restart without connectivity to the Control Tower you might want to allow local unseal.
Note that this setting must be configured before the first host in a site is connected. After that it cannot be changed.