GPU Software Stack Overview
When you use a GPU on a server, you are not dealing with a single piece of software or hardware. Instead, you are working with an entire software stack built around NVIDIA GPUs. From the lowest-level physical hardware to the highest-level Kubernetes scheduling, it can be roughly divided into 5 layers:
Hardware Layer → Linux Kernel Driver Layer → User-Space Libraries/Tools Layer → Container Runtime Layer → Kubernetes / HAMi Scheduling Layer
Understanding this layered structure is the foundation for troubleshooting GPU issues and understanding how HAMi works.
5-Layer Architecture Overview
The diagram below illustrates the complete layered structure of the GPU software stack:
%% title: GPU Software Stack 5-Layer Architecture
graph TB
subgraph L5["Layer 5: Kubernetes Scheduling"]
K8sScheduler["Kubernetes Scheduler"]
HAMiScheduler["HAMi Scheduler"]
Pod["Pod"]
end
subgraph L4["Layer 4: Container Runtime"]
containerd["containerd"]
NVIDIAToolkit["NVIDIA Container Toolkit"]
Container["Container"]
end
subgraph L3["Layer 3: User-Space Libraries/Tools"]
CUDA["CUDA Runtime / Toolkit"]
NVML["NVML (libnvidia-ml.so)"]
nvidia_smi["nvidia-smi"]
DCGM["DCGM"]
MIG["MIG"]
end
subgraph L2["Layer 2: Kernel Driver"]
nvidia_ko["nvidia.ko (Kernel Module)"]
nvidia_uvm["nvidia-uvm.ko"]
nvidia_modeset["nvidia-modeset.ko"]
end
subgraph L1["Layer 1: Hardware"]
GPU["NVIDIA GPU"]
PCIe["PCIe Bus"]
NVLink["NVLink (Optional)"]
end
%% Inter-layer connections
Pod --> Container
K8sScheduler --> Pod
HAMiScheduler --> Pod
Container --> NVIDIAToolkit
NVIDIAToolkit --> containerd
Container --> CUDA
Container --> NVML
nvidia_smi --> NVML
DCGM --> NVML
CUDA --> nvidia_ko
NVML --> nvidia_ko
nvidia_ko --> nvidia_uvm
nvidia_ko --> nvidia_modeset
nvidia_ko --> GPU
GPU --- PCIe
GPU --- NVLink
Layer Details
Layer 1: Hardware
The physical hardware is the foundation of everything:
- GPU: NVIDIA GPU chips (such as A100, H100, L40, etc.), responsible for parallel computing.
- PCIe Bus: GPUs communicate with the CPU through PCIe slots, serving as the primary data transfer channel.
- NVLink (optional): A high-speed interconnect between multiple GPUs, offering far greater bandwidth than PCIe.
Layer 2: Linux Kernel Driver
The kernel driver is the bridge between user-space programs and the GPU:
- nvidia.ko: The core NVIDIA kernel module that manages GPU hardware resources, video memory allocation, and command submission.
- nvidia-uvm.ko: The Unified Virtual Memory module, enabling transparent sharing of memory address spaces between the CPU and GPU.
- nvidia-modeset.ko: The display mode setting module, used for GPU graphics output management.
The kernel driver exposes /dev/nvidia* device nodes upward, through which user-space programs interact with the GPU.
Layer 3: User-Space Libraries/Tools
This layer contains the tools and libraries that developers and administrators interact with most frequently:
- CUDA: NVIDIA's parallel computing platform and programming model, including the compiler (nvcc), runtime libraries, and development tools. Nearly all GPU applications access the GPU through CUDA.
- NVML (NVIDIA Management Library): The GPU management library (
libnvidia-ml.so), providing APIs for querying GPU status (temperature, video memory, utilization, etc.). Tools such asnvidia-smiand DCGM all depend on it. - nvidia-smi: A command-line management tool for viewing GPU status, processes, video memory usage, and more. It is the CLI front-end for NVML.
- DCGM (Data Center GPU Manager): A data center GPU management tool providing health monitoring, diagnostics, group management, and other capabilities, suitable for large-scale GPU clusters.
- MIG (Multi-Instance GPU): Multi-Instance GPU technology that physically partitions a single A100/H100 into multiple isolated GPU instances, each with its own dedicated video memory and compute cores.
Layer 4: Container Runtime
To make GPUs usable within containers, additional runtime components are required:
- containerd: The container runtime responsible for image management and container lifecycle management. Kubernetes uses containerd by default.
- NVIDIA Container Toolkit (formerly nvidia-docker2): Automatically mounts GPU device nodes, CUDA libraries, and NVIDIA driver libraries into the container at startup. It is the key bridge enabling containers to use GPUs.
- Container: A running application container that gains GPU access through the Toolkit.
Layer 5: Kubernetes / HAMi Scheduling
Managing GPUs in a Kubernetes cluster requires:
- NVIDIA Device Plugin: A Kubernetes device plugin that reports GPU resources on the node to kubelet, enabling Kubernetes to be aware of GPUs and schedule GPU workloads.
- GPU Operator: A Kubernetes Operator provided by NVIDIA that automates the deployment and management of drivers, Container Toolkit, Device Plugin, DCGM, and other components.
- HAMi Device Plugin: HAMi's device plugin, supporting fine-grained partitioning and sharing of GPU memory and compute resources.
- HAMi Scheduler: HAMi's scheduler extension, supporting advanced scheduling policies such as Binpack/Spread, priorities, and targeted GPU card scheduling.
Key Call Chains
Understanding the key call chains within the GPU software stack helps with troubleshooting and understanding the relationships between components.
Complete Dependency Chain
From creation to execution, a GPU-using Pod goes through the following dependency chain:
%% title: Complete GPU Pod Dependency Chain
graph LR
Pod --> containerd
containerd --> NVIDIAToolkit["NVIDIA Container Toolkit"]
NVIDIAToolkit --> CUDA
CUDA --> Driver["User-Space Driver Library"]
Driver --> nvidia_ko["nvidia.ko"]
nvidia_ko --> PCIe
PCIe --> GPU
nvidia-smi Call Chain
The complete path for nvidia-smi to query GPU information:
%% title: nvidia-smi Call Chain
graph LR
SMI["nvidia-smi"] --> NVML
NVML --> lib["libnvidia-ml.so"]
lib --> nvidia_ko["nvidia.ko"]
nvidia_ko --> GPU
Management Tool Call Chain
Multiple management tools all access the GPU through NVML:
%% title: Management Tool Call Chain
graph LR
nvidia_smi["nvidia-smi"] --> NVML
DCGM --> NVML
DevicePlugin["Device Plugin"] --> NVML
NVML --> lib["libnvidia-ml.so"]
lib --> nvidia_ko["nvidia.ko"]
nvidia_ko --> GPU
As you can see, whether it is a command-line tool, a monitoring component, or a Kubernetes device plugin, they all ultimately access the hardware through the NVML -> kernel driver -> GPU path.
Kubernetes GPU Scheduling Chain
In Kubernetes, the flow of GPU resources from hardware to Pod:
%% title: Kubernetes GPU Scheduling Chain
graph LR
GPU --> Driver["Driver"]
Driver --> DevicePlugin["Device Plugin"]
DevicePlugin --> kubelet
kubelet --> K8s["Kubernetes API"]
K8s --> Scheduler["K8s Scheduler"]
Scheduler --> Pod
HAMi Enhanced Scheduling Chain
Building on the native scheduling chain, HAMi replaces the Device Plugin and Scheduler to implement GPU partitioning and sharing:
%% title: HAMi Enhanced Scheduling Chain
graph LR
GPU --> Driver["Driver"]
Driver --> HAMiDP["HAMi Device Plugin"]
HAMiDP --> kubelet
kubelet --> K8s["Kubernetes API"]
K8s --> HAMiScheduler["HAMi Scheduler"]
HAMiScheduler --> Pod
Component Quick Reference
| Component | Summary |
|---|---|
| GPU | NVIDIA GPU hardware, executes parallel computing tasks |
| PCIe | Bus connecting the GPU and CPU, responsible for data transfer |
| nvidia.ko | NVIDIA kernel module, manages GPU hardware resources and exposes device nodes |
| nvidia-smi | Command-line tool for viewing GPU status, video memory, processes, and more |
| NVML | NVIDIA Management Library, a C-language API for GPU management |
| libnvidia-ml.so | NVML shared library implementation; all management tools communicate with the driver through it |
| CUDA | NVIDIA parallel computing platform, the core programming and runtime framework for GPU applications |
| MIG | Multi-Instance GPU, physically partitions a single GPU into multiple isolated instances |
| DCGM | Data Center GPU Manager, providing monitoring, diagnostics, and health checks |
| containerd | Container runtime, manages container images and lifecycle |
| NVIDIA Container Toolkit | Container GPU support, automatically mounts GPU devices and libraries into containers at startup |
| Device Plugin | Kubernetes device plugin, reports node GPU resource information to the cluster |
| GPU Operator | NVIDIA Operator, automates deployment and management of the full GPU software stack |
| HAMi | GPU virtualization middleware, supporting fine-grained partitioning and sharing of memory and compute |
| HAMi Device Plugin | HAMi's device plugin, replaces the native Device Plugin and supports GPU partition reporting |