Scheduler

This section describes how service instances are scheduled to hosts within a site. See Site Placement for a description on how applications are scheduled to different sites.

On each site, there is an application controller process that reacts on changes to application specifications. Its job is to create and manage service instance objects that match the application specification. When a new application is created, a number of service instance objects are created, and when the application is deleted, the service instance objects are also deleted. When the application controller creates a service instance, the service instance is not yet assigned to any host in the site.

On each site, there is also a service scheduler process that is responsible for assigning service instances to hosts. Once a service instance has been assigned to a host, it is not moved unless the host goes down, or the service's configuration has been modified so that the service instance can't run on the host anymore.

Scheduling filters and host selection

The service scheduler uses a set of filters to find the set of hosts that can run the service instance. If this set is empty, the service instance can not be scheduled.

In the case of restarting a service instance which has already been scheduled to a certain host, we preserve the choice of host as long as the host is in the filtered set of hosts.

If this set is non-empty, the service scheduler picks the host with the least number of service instances of the same type scheduled already.

The process for selecting a new host for a service instance takes the following aspects into consideration (in prio order):

least number of service instances of the same service type which is not in delayed-shutdown
least number of service instances of the same service type which is in delayed-shutdown
least number of service instances of any other service type which is not in delayed-shutdown
least number of service instances of any other service type which is in delayed-shutdown

The following filters are used:

Labels

A service can specify that its instances must run on hosts that match a given host label match expression. For example, hosts might be labeled with their hardware capabilities.

This filter returns the hosts that match the given host label match expression.

Volumes

A service can specify that its instances need access to ephemeral or persistent volumes (see Ephemeral and persistent volumes ). It specifies a volume label match expression, and a requested size.

This filter returns the hosts that have room for the requested volume size, and where the volumes match the given volume label match expression.

If a service instance already has an ephemeral volume on a given host, that host is prioritized over other other hosts. This means that once a service instance has been scheduled to a host and it has an ephemeral volume, it will not be moved to another host, if it still can run on the first host.

Ingress IP

If the service instance requests an ingress ip address, this filter returns the hosts that have an ip address available.

Preferred affinity

This filter prioritizes the available hosts according to the affinity and anti-affinity rules given in the service specification.

Note that this is not a hard filter, i.e., a service instance may be scheduled to a host that violates the requested affinity rules.

Container layer, memory, CPU

These filters all check the requested resource limits.

Device, GPU

This filter make sure required device or GPU exists on target hosts.

Number service instances

This filter attempt to evenly distribute service instances among target hosts.

Handling Host Failures

When the service scheduler gets a notification that a host is down, it waits for 30 seconds before taking any action. This is to handle the case that the host or supd on the host was restarted.

After these 30 seconds, the service scheduler checks if the host is still down. This test takes up to 5 seconds. If the host is still down at this time, the service scheduler finds all service instances from services with a single replica that are scheduled to the host, and re-schedules them.

Otherwise, if the host is up after this check, but supd is not running, the service scheduler waits 30 seconds again, and if supd is not running after these 30 additional seconds, the service scheduler re-schedules services as described above.

This means that if a service has multiple replicas, the replica that is scheduled to the failed host will not be moved to another host.

Handling host additions and removals

If a host is deleted from a site, all service instances that are scheduled to that host are re-scheduled to other hosts, if possible.

If a host is added to a site, existing service instances are not moved.

Scheduling filters and host selection​

Labels​

Volumes​

Ingress IP​

Preferred affinity​

Container layer, memory, CPU​

Device, GPU​

Number service instances​

Handling Host Failures​

Handling host additions and removals​