Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda by Copilot · Pull Request #43 · chamalgomes/artifacts

Copilot · 2026-02-25T09:43:03Z

Replaces PowerShell-based CUDA workflow with clean bash implementation using nvidia/cuda container directly, eliminating mamba/conda dependency and separate matrix definition job.

Changes

Removed:

define_matrix job - matrix now inline in build_wheels
PowerShell scripts and shell defaults - all bash now
Mamba/conda setup and CUDA toolkit installation
Unused releasetag matrix variable
Environment variable juggling (CUDAVER, AVXVER, CUDA_PATH, etc.)

Added:

nvidia/cuda:13.1.1-devel-ubuntu24.04 container with CUDA pre-installed
cuda_config matrix structure with ver, short, arch properties
enable-cache: true for uv setup
Verification step for cmake, nvcc, nvidia-smi
CMAKE_EXE_LINKER_FLAGS for CUDA stub linking

Bash replacements:

# PowerShell version manipulation
(Get-Content llama_cpp/__init__.py) -replace '__version__ = "([^"]*)"', ('__version__ = "' + '$1' + '+cu' + $cudaVersion + '"')

# Bash equivalent
sed -i 's/__version__ = "\([^"]*\)"/__version__ = "\1+${{ matrix.cuda_config.short }}"/'

Impact: 26 lines removed (22% reduction), 2 jobs → 1 job, cleaner dependency chain.

All preserved features intact: uv build, +cu131 naming, release upload, attestation, workflow_dispatch trigger.

Original prompt

Objective

Modernize the .github/workflows/llama-build-cuda.yaml workflow file by incorporating cleaner approaches from build-wheels-cuda.yaml while retaining the preferred features like uv build and custom release naming convention.

Requirements

Replace the current workflow file with the following improved version:

name: Build Release (CUDA)

on: workflow_dispatch

permissions:
  contents: write

jobs:
  build_wheels:
    name: Build Wheel CUDA ${{ matrix.cuda_config.ver }} Py ${{ matrix.pyver }}
    runs-on: ubuntu-latest
    container:
      image: nvidia/cuda:${{ matrix.cuda_config.ver }}-devel-ubuntu24.04
    permissions:
      id-token: write
      contents: write
      attestations: write
    strategy:
      fail-fast: false
      matrix:
        cuda_config:
          - ver: 13.1.1
            short: cu131
            arch: 75;80;
        pyver: ["3.13", "3.14"]

    steps:
      - name: Checkout repository
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
        with:
          submodules: recursive

      - name: Setup uv and Python
        uses: astral-sh/setup-uv@5a7eac68fb9809dea845d802897dc5c723910fa3 # v7
        with:
          version: "latest"
          enable-cache: true
          python-version: ${{ matrix.pyver }}

      - name: Verify GLIBC version
        run: ldd --version

      - name: Verify installation
        run: |
          python --version
          uv --version
          cmake --version
          nvcc --version
          nvidia-smi || true

      - name: Build Wheel
        working-directory: vendor/llama-cpp-python
        env:
          VERBOSE: 1
          CMAKE_ARGS: >-
            -DGGML_CUDA=on
            -DLLAVA_BUILD=off
            -DCMAKE_CUDA_ARCHITECTURES=${{ matrix.cuda_config.arch }}
            -DGGML_CUDA_FORCE_MMQ=OFF
            -DGGML_AVX2=off
            -DGGML_FMA=off
            -DGGML_F16C=off
            -DLLAMA_BUILD_EXAMPLES=OFF
            -DLLAMA_BUILD_TESTS=OFF
            -DLLAMA_BUILD_SERVER=OFF
            -DCMAKE_EXE_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda"
        run: |
          # Rename the release version with +cu{cuda_ver} build tag
          sed -i 's/__version__ = "\([^"]*\)"/__version__ = "\1+${{ matrix.cuda_config.short }}"/' llama_cpp/__init__.py
          
          # Build wheel using uv
          uv -v build --no-create-gitignore --python cpython@${{ matrix.pyver }} --wheel

      - name: List built wheels
        run: ls -lh vendor/llama-cpp-python/dist/*.whl

      - name: Upload Release Assets
        uses: softprops/action-gh-release@5be0e66d93ac7ed76da52eca8bb058f665c3a5fe # v2
        with:
          files: vendor/llama-cpp-python/dist/*.whl
          tag_name: ${{ github.ref_name }}
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Attest Build Provenance
        uses: actions/attest-build-provenance@977bb373ede98d70efdf65b84cb5f73e068dcc2a # v3
        with:
          subject-path: 'vendor/llama-cpp-python/dist/*.whl'

Key Improvements

Removed PowerShell - All scripts now use clean bash syntax
Uses setup-uv action - Handles Python installation automatically, no manual symlinking needed
Inline matrix definition - No separate define_matrix job required
CUDA container - Direct nvidia/cuda devel image, eliminating mamba/conda dependency
Clean CMAKE_ARGS - Multi-line YAML syntax with >- for better readability
Removed unused releasetag - Simplified matrix to only include necessary variables

Preserved Features

✅ uv build command - uv -v build --no-create-gitignore --python cpython@${{ matrix.pyver }} --wheel
✅ Custom naming convention - +cu131 suffix via sed replacement
✅ Release upload - Using softprops/action-gh-release
✅ Provenance attestation - Supply chain security with actions/attest-build-provenance
✅ Same trigger - workflow_dispatch
✅ Same permissions - contents: write, id-token: write, attestations: write

Testing

After merging this PR, please test the workflow by:

Creating a new tag or using an existing one
Manually triggering the workflow via workflow_dispatch
Verifying that wheels are built with the correct naming convention (e.g., +cu131 suffix)
Confirming that the wheels are uploaded to the release

Code Guidelines

Follows Python conventions
Optimized for code simplicity
Uses bash throughout (no PowerShell)
Structured for MLOps workflows

This pull request was created from Copilot chat.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>

Copilot

Pull request overview

Modernizes the CUDA wheel build GitHub Actions workflow by consolidating the matrix into a single job, switching from PowerShell to bash, and building inside an NVIDIA CUDA container to remove the conda/mamba toolchain setup.

Changes:

Removes the separate define_matrix job and inlines the build matrix into build_wheels.
Replaces PowerShell scripting with bash (including version tag rewriting via sed).
Uses an nvidia/cuda devel container image and adds a CUDA stub linker flag for builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T09:53:21Z

.github/workflows/llama-build-cuda.yaml

+        cuda_config:
+          - ver: 13.1.1
+            short: cu131
+            arch: 75;80;


matrix.cuda_config.arch is set to 75;80; with a trailing semicolon. In CMake, semicolon-separated lists with a trailing ; typically introduce an empty list element, which can make CMAKE_CUDA_ARCHITECTURES include an empty architecture and fail configuration. Drop the trailing semicolon (e.g., 75;80) or represent architectures as a YAML list and join them explicitly.

Suggested change

arch: 75;80;

arch: 75;80

Copilot · 2026-02-25T09:53:21Z

.github/workflows/llama-build-cuda.yaml

+    name: Build Wheel CUDA ${{ matrix.cuda_config.ver }} Py ${{ matrix.pyver }}
+    runs-on: ubuntu-latest
+    container:
+      image: nvidia/cuda:${{ matrix.cuda_config.ver }}-devel-ubuntu24.04


Running the job in an nvidia/cuda:* container means all steps execute inside that image. This workflow relies on several JavaScript/composite actions (actions/checkout, setup-uv, action-gh-release, attest-build-provenance) which require tooling like node (and git for checkout) to be present in the container. The nvidia/cuda images don’t guarantee those dependencies, so the workflow can fail early. Consider switching to a custom CUDA build image that includes node + git (and any build deps like cmake), or avoid a job-level container and instead run the CUDA build via docker run in a run: step.

Initial plan

8166cda

Copilot AI assigned Copilot and chamalgomes Feb 25, 2026

Copilot started work on behalf of chamalgomes February 25, 2026 09:43 View session

Modernize llama-build-cuda.yaml workflow with clean bash syntax

4cb71b4

Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Modernize llama-build-cuda workflow using cleaner approaches~~ Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda Feb 25, 2026

Copilot AI requested a review from chamalgomes February 25, 2026 09:45

Copilot finished work on behalf of chamalgomes February 25, 2026 09:45

chamalgomes marked this pull request as ready for review February 25, 2026 09:48

Copilot AI review requested due to automatic review settings February 25, 2026 09:48

chamalgomes approved these changes Feb 25, 2026

View reviewed changes

Copilot started reviewing on behalf of chamalgomes February 25, 2026 09:48 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda#43

Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda#43
Copilot wants to merge 2 commits intomainfrom
copilot/modernize-llama-build-workflow

Copilot AI commented Feb 25, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Objective

Requirements

Key Improvements

Preserved Features

Testing

Code Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 25, 2026 •

edited

Loading