Use GPU in docker container and Kubernetes/OpenShift

Xin Cheng
7 min readFeb 18, 2021

Using GPU on managed machine learning platform (e.g. Azure machine learning, Amazon Sagemaker, Google Cloud AI Platform) is easy, as lots of details are abstracted away from. For example, when you use their machine learning SDK, your machine learning code is usually packaged into a docker container with the supporting machine learning framework and executed on the platform, and usually you don’t need to build the container yourself, since the platform does this all for you. However, if you don’t have access to these platforms, but still want to leverage modern container approach in your machine learning development lifecycle, you need to know a bit more how code from within a container can leverage the GPU installed on the host machine.

NVIDIA AI GPUs and CUDA are the most popular in this domain, so we will focus on them now. The following article actually explains the process on docker very well. Before continuing, read the article first. Below I add my understanding as a complement to help you understand better.

Run on docker

  1. You don’t need to install NVIDIA CUDA driver in docker image. GPU is a specific hardware, and usually you need to install the driver for it (like printer driver). You can do it (it is called brute force approach in above article). However, the challenges are that there are lots of different GPU, installing a specific GPU driver means the docker image is bound to the GPU the driver is targeted for, which increases management burden and defeats the purpose of portability. In addition, when I want to build a NVIDIA CUDA myself on a server without GPU, it simply fails. From the following image, CUDA driver is installed on host OS, while docker image only contains CUDA toolkit. This keeps docker image CUDA-driver agnostic, making it more portable and stable.

2. However, when your container runs, it still needs CUDA driver to properly access GPU on host server. To do this, NVIDIA provides container runtime library and NVIDIA docker, when you use NVIDIA docker to launch container image, it automatically configures containers to leverage NVIDIA GPUs installed on the host OS. The difference between plain docker command is just to add “ — gpus <number of gpus or all>”.

3. Usually you would use pytorch or tensorflow image from docker hub to build your custom image. Make sure the version is compiled with the correct NVIDIA CUDA version on host OS, otherwise, code in container won’t recognize GPUs on host OS. To check NVIDIA CUDA version, run nvidia-smi on host OS, you will see “CUDA version”, then in docker hub, you need to look for tag that’s containg cuda<CUDA version on host OS>

For example, host OS nvidia-smi result

+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — -+

| NVIDIA-SMI 418.165.02 Driver Version: 418.165.02 CUDA Version: 10.1 |

| — — — — — — — — — — — — — — — -+ — — — — — — — — — — — + — — — — — — — — — — — +

In docker hub, look for cuda10.1, e.g. 1.6.0-cuda10.1-cudnn7-runtime, 1.6.0-cuda10.1-cudnn7-devel.

Run on Kubernetes/OpenShift

Kubernetes/OpenShift has a different layer, scheduling layer. The cluster may contain GPU nodes and non-GPU nodes, if you want to run GPU code, you don’t want your Pod to be scheduled on GPU nodes. So on cluster side, there is something called “device plugin” to let Pods access specialized hardware features such as GPUs. On OpenShift, NVIDIA GPU Operator further automates the procedure.

How docker container uses GPU, pretty simple and across OpenShift, AKS, EKS, GKE.

resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU

Reference

In case you want to build NVIDIA CUDA, pytorch, tensorflow docker image yourself, use the following as start.

NVIDIA CUDA Dockerfile

some version hack

Make sure LD_LIBRARY_PATH includes cuda, cudnn libraries (otherwise, pytorch, tensorflow may not be able to detect GPU within the container). Also different version of tensorflow assumes version-specific CUDA file name, symlink trick is a not suggested but a quick workaround (although I hope industry can have a better solution than this symlink hack and recompilation for easy and wider use).

Pytorch Dockerfile

Tensorflow Dockerfile

Also, if you are in a constrained environment and cannot access those public docker hub images, you may need to build image yourself. In this situation, notice the CUDA version and deep learning framework version compatibility. I find pytorch version compatibility seems to be better and more explicit.

Tensorflow 1.x seems to be pickier but 2.x seems to improve.

For example, in page above, tensorflow-gpu-1.15.0 is tested on CUDA 10.0. When CUDA version is 10.2, you have to compile from source (otherwise, it won’t find necessary CUDA files). Here is one good article:

However, when I use tensorflow 2.3.1, it can work with CUDA 10.2, although the page says only tested on 10.1 (which means CUDA 10.1 is compatible with 10.2?). If you start a new tensorflow project, start with 2.x, which could save you lots of pain.

Host OS CUDA version and docker image CUDA version compatibility

CUDA claims backward compatibility, meaning that applications compiled against a particular version of the CUDA will continue to work on subsequent (later) driver releases.

Here is quick test:

nvidia-smi all shows 11.0 within container
pytorch
pytorch/pytorch:1.8.1-cuda11.1-cudnn8-runtime
pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
host cuda 11.0, container: 11.1, torch.cuda.is_available() is True
host cuda 11.0, container: 10.1, torch.cuda.is_available() is True

tensorflow/tensorflow:2.4.1-gpu
tf.test.is_gpu_available() returns True, cuda 11.0 in /usr/local
tensorflow/tensorflow:2.3.2-gpu
tf.test.is_gpu_available() returns True, cuda 10.1 in /usr/local

However, when CUDA driver is 11.2, docker image with 10.2 CUDA toolkit torch.cuda.is_available() returns False with pytorch 1.6.0 (update: after LD_LIBRARY_PATH trick to include cuda, cudnn libraries, it returns True; more update: simpletransformers model encounters Unable to write to file </torch_> error, it seems pytorch is more shared-memory-eager now, shared memory on openshift trick, but kubernetes is still open)

Also, in article below, it mentions no GPU improvement for tesseract, while easyOCR (using pytorch backend) sees amazing improvement. In our test, GPU speeds up about 7 times on average. So there are multiple OCR choices and could be some tradeoff depending on the use case.

Best GPU for deep learning

--

--

Xin Cheng

Multi/Hybrid-cloud, Kubernetes, cloud-native, big data, machine learning, IoT developer/architect, 3x Azure-certified, 3x AWS-certified, 2x GCP-certified