Skip to content

Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda#43

Open
Copilot wants to merge 2 commits intomainfrom
copilot/modernize-llama-build-workflow
Open

Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda#43
Copilot wants to merge 2 commits intomainfrom
copilot/modernize-llama-build-workflow

Conversation

Copy link
Contributor

Copilot AI commented Feb 25, 2026

Replaces PowerShell-based CUDA workflow with clean bash implementation using nvidia/cuda container directly, eliminating mamba/conda dependency and separate matrix definition job.

Changes

Removed:

  • define_matrix job - matrix now inline in build_wheels
  • PowerShell scripts and shell defaults - all bash now
  • Mamba/conda setup and CUDA toolkit installation
  • Unused releasetag matrix variable
  • Environment variable juggling (CUDAVER, AVXVER, CUDA_PATH, etc.)

Added:

  • nvidia/cuda:13.1.1-devel-ubuntu24.04 container with CUDA pre-installed
  • cuda_config matrix structure with ver, short, arch properties
  • enable-cache: true for uv setup
  • Verification step for cmake, nvcc, nvidia-smi
  • CMAKE_EXE_LINKER_FLAGS for CUDA stub linking

Bash replacements:

# PowerShell version manipulation
(Get-Content llama_cpp/__init__.py) -replace '__version__ = "([^"]*)"', ('__version__ = "' + '$1' + '+cu' + $cudaVersion + '"')

# Bash equivalent
sed -i 's/__version__ = "\([^"]*\)"/__version__ = "\1+${{ matrix.cuda_config.short }}"/'

Impact: 26 lines removed (22% reduction), 2 jobs → 1 job, cleaner dependency chain.

All preserved features intact: uv build, +cu131 naming, release upload, attestation, workflow_dispatch trigger.

Original prompt

Objective

Modernize the .github/workflows/llama-build-cuda.yaml workflow file by incorporating cleaner approaches from build-wheels-cuda.yaml while retaining the preferred features like uv build and custom release naming convention.

Requirements

Replace the current workflow file with the following improved version:

name: Build Release (CUDA)

on: workflow_dispatch

permissions:
  contents: write

jobs:
  build_wheels:
    name: Build Wheel CUDA ${{ matrix.cuda_config.ver }} Py ${{ matrix.pyver }}
    runs-on: ubuntu-latest
    container:
      image: nvidia/cuda:${{ matrix.cuda_config.ver }}-devel-ubuntu24.04
    permissions:
      id-token: write
      contents: write
      attestations: write
    strategy:
      fail-fast: false
      matrix:
        cuda_config:
          - ver: 13.1.1
            short: cu131
            arch: 75;80;
        pyver: ["3.13", "3.14"]

    steps:
      - name: Checkout repository
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5
        with:
          submodules: recursive

      - name: Setup uv and Python
        uses: astral-sh/setup-uv@5a7eac68fb9809dea845d802897dc5c723910fa3 # v7
        with:
          version: "latest"
          enable-cache: true
          python-version: ${{ matrix.pyver }}

      - name: Verify GLIBC version
        run: ldd --version

      - name: Verify installation
        run: |
          python --version
          uv --version
          cmake --version
          nvcc --version
          nvidia-smi || true

      - name: Build Wheel
        working-directory: vendor/llama-cpp-python
        env:
          VERBOSE: 1
          CMAKE_ARGS: >-
            -DGGML_CUDA=on
            -DLLAVA_BUILD=off
            -DCMAKE_CUDA_ARCHITECTURES=${{ matrix.cuda_config.arch }}
            -DGGML_CUDA_FORCE_MMQ=OFF
            -DGGML_AVX2=off
            -DGGML_FMA=off
            -DGGML_F16C=off
            -DLLAMA_BUILD_EXAMPLES=OFF
            -DLLAMA_BUILD_TESTS=OFF
            -DLLAMA_BUILD_SERVER=OFF
            -DCMAKE_EXE_LINKER_FLAGS="-L/usr/local/cuda/lib64/stubs -lcuda"
        run: |
          # Rename the release version with +cu{cuda_ver} build tag
          sed -i 's/__version__ = "\([^"]*\)"/__version__ = "\1+${{ matrix.cuda_config.short }}"/' llama_cpp/__init__.py
          
          # Build wheel using uv
          uv -v build --no-create-gitignore --python cpython@${{ matrix.pyver }} --wheel

      - name: List built wheels
        run: ls -lh vendor/llama-cpp-python/dist/*.whl

      - name: Upload Release Assets
        uses: softprops/action-gh-release@5be0e66d93ac7ed76da52eca8bb058f665c3a5fe # v2
        with:
          files: vendor/llama-cpp-python/dist/*.whl
          tag_name: ${{ github.ref_name }}
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

      - name: Attest Build Provenance
        uses: actions/attest-build-provenance@977bb373ede98d70efdf65b84cb5f73e068dcc2a # v3
        with:
          subject-path: 'vendor/llama-cpp-python/dist/*.whl'

Key Improvements

  1. Removed PowerShell - All scripts now use clean bash syntax
  2. Uses setup-uv action - Handles Python installation automatically, no manual symlinking needed
  3. Inline matrix definition - No separate define_matrix job required
  4. CUDA container - Direct nvidia/cuda devel image, eliminating mamba/conda dependency
  5. Clean CMAKE_ARGS - Multi-line YAML syntax with >- for better readability
  6. Removed unused releasetag - Simplified matrix to only include necessary variables

Preserved Features

  1. uv build command - uv -v build --no-create-gitignore --python cpython@${{ matrix.pyver }} --wheel
  2. Custom naming convention - +cu131 suffix via sed replacement
  3. Release upload - Using softprops/action-gh-release
  4. Provenance attestation - Supply chain security with actions/attest-build-provenance
  5. Same trigger - workflow_dispatch
  6. Same permissions - contents: write, id-token: write, attestations: write

Testing

After merging this PR, please test the workflow by:

  1. Creating a new tag or using an existing one
  2. Manually triggering the workflow via workflow_dispatch
  3. Verifying that wheels are built with the correct naming convention (e.g., +cu131 suffix)
  4. Confirming that the wheels are uploaded to the release

Code Guidelines

  • Follows Python conventions
  • Optimized for code simplicity
  • Uses bash throughout (no PowerShell)
  • Structured for MLOps workflows

This pull request was created from Copilot chat.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: chamalgomes <106123134+chamalgomes@users.noreply.github.com>
Copilot AI changed the title [WIP] Modernize llama-build-cuda workflow using cleaner approaches Modernize CUDA build workflow: remove PowerShell, inline matrix, drop conda Feb 25, 2026
Copilot AI requested a review from chamalgomes February 25, 2026 09:45
@chamalgomes chamalgomes marked this pull request as ready for review February 25, 2026 09:48
Copilot AI review requested due to automatic review settings February 25, 2026 09:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Modernizes the CUDA wheel build GitHub Actions workflow by consolidating the matrix into a single job, switching from PowerShell to bash, and building inside an NVIDIA CUDA container to remove the conda/mamba toolchain setup.

Changes:

  • Removes the separate define_matrix job and inlines the build matrix into build_wheels.
  • Replaces PowerShell scripting with bash (including version tag rewriting via sed).
  • Uses an nvidia/cuda devel container image and adds a CUDA stub linker flag for builds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cuda_config:
- ver: 13.1.1
short: cu131
arch: 75;80;
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matrix.cuda_config.arch is set to 75;80; with a trailing semicolon. In CMake, semicolon-separated lists with a trailing ; typically introduce an empty list element, which can make CMAKE_CUDA_ARCHITECTURES include an empty architecture and fail configuration. Drop the trailing semicolon (e.g., 75;80) or represent architectures as a YAML list and join them explicitly.

Suggested change
arch: 75;80;
arch: 75;80

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +13
name: Build Wheel CUDA ${{ matrix.cuda_config.ver }} Py ${{ matrix.pyver }}
runs-on: ubuntu-latest
container:
image: nvidia/cuda:${{ matrix.cuda_config.ver }}-devel-ubuntu24.04
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the job in an nvidia/cuda:* container means all steps execute inside that image. This workflow relies on several JavaScript/composite actions (actions/checkout, setup-uv, action-gh-release, attest-build-provenance) which require tooling like node (and git for checkout) to be present in the container. The nvidia/cuda images don’t guarantee those dependencies, so the workflow can fail early. Consider switching to a custom CUDA build image that includes node + git (and any build deps like cmake), or avoid a job-level container and instead run the CUDA build via docker run in a run: step.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants