GPU passthrough
This tutorial gives examples on how to see available GPUs on a site, grant tenants access to selected GPUs to and indicate GPU requirements from an application specification. Currently NVIDIA and Intel GPUs are supported.
Prerequisites
There are different ways to mount NVIDIA GPUs into the container depending on the container runtime. Intel GPUs do not have special prerequisites and can be detected and mounted automatically.
Edge Enforcer detects the supported methods of running NVIDIA GPUs and runs the GPU discovery at Edge Enforcer start-up. When updating the Docker daemon configuration in this regard, or updating the CDI specification, make sure to restart the Edge Enforcer (after restarting Docker daemon, if applicable) so that it is up to date with the latest daemon configuration.
NVIDIA CDI specification
This method is available under Podman container runtime and newer versions of Docker (starting with version 25.0).
The procedure to create a CDI specification for NVIDIA devices on a host is
defined in the NVIDIA Container Toolkit User Guide.
The container engine (Podman or Docker) must be able to find (e.g. in the
/etc/cdi
directory) and read the CDI specification, as generated by
nvidia-ctk cdi generate
The Avassa platform will probe the devices labelled nvidia.com/gpu=all
and
list the available devices with this label.
If the NVIDIA Docker runtime is configured, then it takes precedence over CDI specification.
NVIDIA Docker runtime
This method is available only under Docker container runtime. When running with Podman refer to the CDI interface mentioned above.
In order for the system to gain access to NVIDIA GPU on a host, and to be able
to pass GPU through to applications, nvidia-container-toolkit
must be
installed on the host as described in the NVIDIA Container Toolkit User Guide
and NVIDIA runtime must be configured in Docker's daemon.json
, but
should not be set as default runtime as the Avassa platform will selectively
use this runtime for the containers that require it.
The Docker runtime must be called nvidia
for the Avassa platform to detect it.
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Showing the available GPUs
As a site provider, to confirm that the expected GPU has been found on all
hosts in a site and to see the GPU parameters as discovered by the Avassa
platform, use the following supctl
command:
supctl show -s stockholm-sergel system cluster hosts --fields hostname,gpus
- hostname: stockholm-sergel-001
gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
vendor: NVIDIA
name: Tesla M60
serial: "0321017046575"
memory: 7680 MiB
driver-version: 525.60.13
compute-mode: Default
compute-capability: "5.2"
display-mode: Enabled
labels: []
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
vendor: NVIDIA
name: Tesla M60
serial: "0321017046575"
memory: 7680 MiB
driver-version: 525.60.13
compute-mode: Default
compute-capability: "5.2"
display-mode: Enabled
labels: []
From this output, we can see that two identical NVIDIA GPUs were discovered on
the single host within stockholm-sergel
site.
Creating a GPU label
In the general case the application owner does not have access to the list of all GPUs on the site (unless this role coincides with the site provider). For this reason, an application specification cannot refer directly to a GPU unit, instead it refers to GPU labels created by the site provided and granted to the application owner.
An example of creating the simplest possible gpu label that refers to all GPUs within the site. This example creates a GPU label in the system settings, which means the label becomes available on all sites in the system.
supctl create system settings <<EOF
gpu-labels:
- label: all
all: true
EOF
An example of a more specific GPU label, valid only on stockholm-sergel
site, selecting any one of the "Tesla"-brand NVIDIA GPUs.
supctl merge system sites stockholm-sergel <<EOF
gpu-labels:
- label: any-tesla
max-number-gpus: 1
gpu-patterns:
- vendor == "NVIDIA", name == "*Tesla*"
EOF
Note the gpu-patterns
expression. It is possible to match on any parameter
that appears in the GPU list in the step before. See the reference
documentation for the system settings object
for the detailed description of the gpu-patterns
expression syntax.
Verify that the GPU labels are available on a specific site (stockholm-sergel
in this case):
supctl show -s stockholm-sergel system cluster hosts --fields hostname,gpu-labels
- hostname: stockholm-sergel
gpu-labels:
- name: all
matching-gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
- name: any-tesla
max-number-gpus: 1
matching-gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
By comparing the GPU IDs in this list to the IDs from the GPU list in the
previous step we can see that both labels are available on the site and both
labels match both GPUs. However, the difference is the max-number-gpus
parameter which indicates that by referencing the any-tesla
label a
particular container is only assigned one of the matching-gpus
, not both. It
is still possible that the other GPU is assigned to a different container
referencing the same label on the site.
Granting a subtenant access to a GPU label
This step is only required if there is a subtenant that needs to be granted GPU access. If there are no subtenants and the tenant that configured the gpu-labels in the previous step is the one to run applications, then this step may be skipped.
By default, labels created as described in the previous step are only accessible to the site provider. In order for its subtenants to gain access to the GPUs the site provider needs to create a tenant-specific resource-profile and assign the GPU labels to this profile, in a similar fashion as described in the Device Discovery tutorial.
We use an application owner tenant, acme, as example here.
To create a new resource-profile called t-acme-gpu
:
supctl create resource-profiles <<EOF
name: t-acme-gpu
gpu-labels:
- name: all
- name: any-tesla
EOF
To assign the resource-profile to the tenant globally. Note that this command
replaces any resource-profile already assigned, if this is not desired then
the currently assigned resource-profile may be updated with the list of
gpu-labels
.
supctl merge tenants acme <<EOF
resource-profile: t-acme-gpu
EOF
Note that there may only be one resource-profile per tenant as a global setting, however it may be refined on a per-site basis.
Assuming that the stockholm-sergel
site is already assigned to tenant acme
:
supctl show tenants acme assigned-sites --fields name
- name: stockholm-sergel
The tenant can now see the list of GPUs visible to them on this site:
supctl show -s stockholm-sergel assigned-sites stockholm-sergel --fields gpus,gpu-labels
gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
vendor: NVIDIA
name: Tesla M60
serial: "0321017046575"
memory: 7680 MiB
driver-version: 525.60.13
compute-mode: Default
compute-capability: "5.2"
display-mode: Enabled
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
vendor: NVIDIA
name: Tesla M60
serial: "0321017046575"
memory: 7680 MiB
driver-version: 525.60.13
compute-mode: Default
compute-capability: "5.2"
display-mode: Enabled
gpu-labels:
- name: all
matching-gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
- name: any-tesla
max-number-gpus: 1
matching-gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
Creating an application with GPU passthrough
The system is now ready for an application requiring GPU access to be deployed. The following is an example of such an application:
name: sample-gpu-app
version: 0.0.1
services:
- name: s
mode: replicated
replicas: 1
containers:
- name: c
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
entrypoint:
- /bin/bash
cmd:
- "-c"
- sleep infinity
gpu:
labels:
- all
To create this application specification:
cat sample-gpu-app.yaml | supctl create applications
In order to deploy this application on stockholm-sergel
site we need
an application deployment object:
name: sample-gpu-dep
application: sample-gpu-app
application-version: "0.0.1"
placement:
match-site-labels: system/name=stockholm-sergel
To create this application deployment:
cat sample-gpu-dep.yaml | supctl create application-deployments
Once the application is deployed we can see that both GPUs available on the host are passed through to the container:
supctl show -s stockholm-sergel \
applications sample-gpu-app service-instances s-1 \
--fields oper-status,containers/[name,gpus]
oper-status: running
containers:
- name: c
gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c
- id: GPU-ee1b2a5c-3cd0-0c4a-a240-d87c22748a35
This is expected, because the application requested all GPUs that have the
GPU-label all
, and there was no other limitations neither on the label,
nor in the application specification. Had the application referred to
any-tesla
label only one GPU would be passed through to the application
because of the max-number-gpus
limit on the label.
In order to verify that the GPU is visible inside the container we may use the
nvidia-smi
binary which has been mounted into the container by the utility
NVIDIA driver capability.
supctl do -s stockholm-sergel applications sample-gpu-app \
service-instances s-1 containers c exec nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 00000000:00:1D.0 Off | 352 |
| N/A 26C P0 38W / 150W | 0MiB / 7680MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 Off | 00000000:00:1E.0 Off | 352 |
| N/A 35C P0 38W / 150W | 0MiB / 7680MiB | 49% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
We may also try to run the NVIDIA test application included in this image.
supctl do -s stockholm-sergel applications sample-gpu-app \
service-instances s-1 containers c exec /tmp/vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
Requesting passthrough of a subset of GPUs matching a GPU label
In certain cases a site provider may grant access to a number of GPUs that is larger than what is required by the application. In this case, it is possible to further limit the number of GPUs passed through into the container, or even write an expression to select GPU matching certain parameters.
A modified example of the application specification mentioned in the previous step:
name: sample-gpu-app
version: 0.0.2
services:
- name: s
mode: replicated
replicas: 1
containers:
- name: c
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
entrypoint:
- /bin/bash
cmd:
- "-c"
- sleep infinity
gpu:
labels:
- all
gpu-patterns:
- vendor == 'NVIDIA, memory >= "4 GiB", driver-version > "515", display-mode == "Enabled"
number-gpus: 1
In this example the application is still requesting a GPU matching the GPU
label all
, but further refines the requested GPUs by gpu-patterns
and
number-gpus
statements.
The gpu-patterns
statement allows the application owner to select a GPU
corresponding to a certain set of parameters. In this example the GPU memory
must be at least 4 GiB, the NVIDIA driver version at least 515.* and the
GPU must have a connected display. This does not really narrow down the set of
GPUs in this particular example as both GPU on this site match this expression,
but it makes sense in environments where different GPU models are present. The
syntax of the gpu-patterns
expression is described in detail in the
reference documentation for the application
object.
The number-gpus
indicates the exact number of GPUs to be passed through into
the container. This means that if the number of GPUs matching the
gpu-labels
and gpu-patterns
expressions is greater than the desired
number of GPUs, then exactly the desired number is selected by the scheduler
to be passed through to the container. If the number of matching GPUs is
lower than the desired number, then the application fails to start.
As a result of deploying this application one GPU is passed through to the container:
supctl show -s stockholm-sergel \
applications sample-gpu-app service-instances s-1 \
--fields oper-status,containers/[name,gpus]
oper-status: running
containers:
- name: c
gpus:
- id: GPU-b75c47d9-5fb4-63e0-a07b-ff2633af741c