Skip to content

GPU-ready k3d clusters with CUDA + K3s support using Docker and NVIDIA runtime.

License

Notifications You must be signed in to change notification settings

88plug/k3d-gpu

 
 

Repository files navigation

k3d-gpu

A Docker-based solution for building a rancher/k3s + nvidia/cuda image that enables a k3d cluster to access your host’s NVIDIA CUDA‑capable GPU(s).


Table of Contents

  1. Features
  2. Prerequisites
  3. Environment Variables
  4. Building & Pushing the Image
  5. k3d Cluster Setup
  6. Testing GPU Access
  7. References
  8. Contributing
  9. Release History
  10. License

Features

  • Combines K3s and NVIDIA CUDA support in a single container image
  • Pre‑configured with NVIDIA Container Toolkit for containerd
  • Exposes standard K3s entrypoint (/bin/k3s agent)
  • Mounts volumes for kubelet, k3s state, CNI, and logs
  • Tunable via build arguments for K3s and CUDA versions

Prerequisites

  • Docker (20.10+), configured with NVIDIA GPU support (i.e., nvidia-docker2 or Docker’s built‑in --gpus)
  • k3d (v5.0.0 or later) to manage local K3s clusters
  • A host NVIDIA GPU with up‑to‑date drivers & CUDA toolkit

Environment Variables

Variable Default Description
K3S_TAG v1.34.1-k3s1-amd64 K3s image tag to use from rancher/k3s
CUDA_TAG 13.1.1-base-ubuntu24.04 CUDA base image tag from nvidia/cuda

You can override these when building:

docker build \
  --build-arg K3S_TAG="v1.28.8-k3s1" \
  --build-arg CUDA_TAG="12.4.1-base-ubuntu22.04" \
  -t cryptoandcoffee/k3d-gpu .

Building & Pushing the Image

Clone this repository and build with the included build.sh or manually:

git clone https://github.com/88plug/k3d-gpu.git
cd k3d-gpu

# Using build.sh
./build.sh

# Or manually
docker build --platform linux/amd64 \
  -t cryptoandcoffee/k3d-gpu .

# Push to Docker Hub (or your registry)
docker push cryptoandcoffee/k3d-gpu

k3d Cluster Setup

Create a k3d cluster that uses the GPU‑enabled image and passes all host GPUs into each node container:

k3d cluster create gpu-cluster \
  --image cryptoandcoffee/k3d-gpu \
  --servers 1 --agents 1 \
  --gpus all \
  --port 6443:6443@loadbalancer

Note: The --gpus all flag exposes every host GPU to the server and agent containers.

Host System Configuration

For optimal performance, you may need to increase inotify limits on your host system (not in containers):

# Temporarily (until reboot):
sudo sysctl -w fs.inotify.max_user_watches=100000
sudo sysctl -w fs.inotify.max_user_instances=100000

# Permanently (survives reboots):
echo "fs.inotify.max_user_watches=100000" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_user_instances=100000" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

NVIDIA Device Plugin

To schedule GPU workloads, install the NVIDIA device plugin DaemonSet in your cluster:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.18.2/nvidia-device-plugin.yml

Testing GPU Access

Once your cluster and plugin are running, verify GPU visibility:

# On the server node:
docker exec -it k3d-gpu-cluster-server-0 nvidia-smi

# In a pod:
kubectl run cuda-test --rm -it --restart=Never \
  --image=nvidia/cuda:12.0-base-ubuntu22.04 \
  -- nvidia-smi

Successful nvidia-smi output confirms that your GPU is accessible from within the cluster.


References


Contributing

Contributions, issues, and feature requests are welcome! Please fork the repository and submit a pull request.


Release History

Date CUDA Tag K3s Tag

License

Apache 2.0 © 2025 Crypto & Coffee Development Team

About

GPU-ready k3d clusters with CUDA + K3s support using Docker and NVIDIA runtime.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Dockerfile 93.7%
  • Shell 6.3%