Skip to main content

Planning for disconnected sites

Many edge deployments must handle scenarios where sites are disconnected for varying durations, whether planned or unplanned. Avassa is designed to support intermittent connectivity. The Edge Enforcer operates autonomously, with all required artifacts stored and replicated locally at the site. This ensures that control actions can be executed at the edge, independent of the Control Tower.

A key architectural feature is the loose coupling between Control Tower and Edge Enforcer. Communication is handled via a bidirectional pub/sub bus, allowing messages to be buffered both centrally and at the edge during disconnections. Upon reconnection, these messages are synchronized seamlessly.

The platform also supports deployment and upgrade workflows during disconnected periods, including image housekeeping and configuration of pending upgrades.

This guide helps you configure and plan your edge deployment for relevant disconnected scenarios, including:

  • Configuring site disconnected behavior
  • Managing upgrade sequences
  • Certificate management
  • User management
  • Monitoring disconnected sites
  • Local operations at edge sites
  • Local unseal at restart?

Configuring site disconnected behavior

Edge sites may have different connectivity profiles. Some may have stable connections where any disconnect is an incident, while others may experience frequent, expected disconnects.

You can configure each site’s behavior when disconnected using the when-disconnected property. Supported values:

  • treat-as-normal: Temporary disconnects are acceptable; the site is expected to be connected during normal operation. This is the default behavior.
  • treat-as-expected: Disconnects are normal and not alerted; the site may be non-operational for periods, but deployments continue.
  • treat-as-error: Connectivity is required; any disconnect triggers an alert.

When an application, or a new version of an application, is deployed to a site that is offline, by default the deployment waits for the site to come back online and actually deploy the application. However, if a site is known to be offline for a longer period of time, this behavior is not ideal. By configuring the site's when-disconnected property to treat-as-expected, the deployment will continue even if the site is offline.

The following table describes what happens when a site is disconnected. It shows if an alert is generated when a site disconnects, and what happens if a new application version is deployed when the site is disconnected.

when-disconnectedAlertApplication deployment status
treat-as-normalnoDeployment remains in deploying state
treat-as-expectednoDeployment continues (except for canary releases)
treat-as-erroryesDeployment remains in deploying state

Consider:

  • Should disconnected sites trigger alerts?
  • Should disconnected sites block deployments?

Monitoring disconnected sites

Sites that remain disconnected for extended periods can be overlooked. Use the Control Tower UI to filter and sort sites by connection status and duration. For troubleshooting, inspect connect/disconnect events in the site view.

You can also analyze the Volga topic system:events for site-connected and site-disconnected events.

To list disconnected sites using supctl:

supctl  show system sites --where="connection-state/connected='false'" --fields=name

If when-disconnected is set to treat-as-error, alerts will be generated for disconnected sites.

Configuring upgrade sequences

If your deployment pipeline releases updates regularly (e.g., bi-weekly), sites disconnected for extended periods may accumulate multiple pending upgrades. By default, these upgrades are applied sequentially upon reconnection.

You can optimize this process by allowing the deployment to skip versions. Configure this using the upgrade-from field.

Here's an example of a simple application that supports such upgrades:

name: theater-room-manager
version: "2.2"
services:
- name: theater-operations
mode: replicated
replicas: 3
containers:
- name: digital-assets-manager
image: ...
upgrade-from:
- version-regexp: "."
method: per-service
services:
- name: theater-operations
instances-in-parallel: 1
healthy-time: 30s

See also application upgrades.

Best practice: Ensure your application supports upgrades from any previous version. This will make an upgrade after longer disconnect faster since intermediate versions can be skipped

Certificate management

Avassa uses site-local certificates for security. Certificates have a default TTL and are auto-rotated. Sites should not remain disconnected longer than the certificate TTL or the scheduled rotation period.

Set the offline-grace-period in the system settings to match your expected maximum disconnected duration. Certificate defaults are derived from this value.

Monitor certificate expiration using supctl or the Control Tower UI, which provides alerts for certificates nearing expiration.

If a site misses certificate renewal due to extended disconnection, Avassa provides a built-in recovery mechanism.

See certificate management for disconnected scenarios for detailed instructions on these topics.

User management

When operating sites via Control Tower, authentication is handled centrally (OIDC or local users in Strongbox). For local operations during outages, ensure site-local authentication is enabled.

Create Strongbox users and distribute them to the relevant sites:

distribute those users to the site

This allows local users to authenticate and use supctl and APIs even when disconnected.

Edge site local operations

While disconnected, users can perform local operations using supctl and APIs. Consider deploying a site-local custom web UI for simplified management, such as the Site Admin UI example.

Local configuration changes made during disconnection are flagged. Upon reconnection, the central operations team can review and choose to keep local changes or overwrite with central configuration.

Read more: disconnected site operations

Site local unseal

If you have scenarios where sites will restart without connectivity to the Control Tower you might want to allow local unseal.

Note that this setting must be configured before the first host in a site is connected. After that it cannot be changed.