Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .github/workflows/llama-build-cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ jobs:
cuda_config:
- ver: 13.1.1
short: cu131
arch: 75;80
pyver: ["3.13", "3.14"]
arch: 75
pyver: ["3.14"]

defaults:
run:
Expand Down Expand Up @@ -69,9 +69,9 @@ jobs:
-DLLAVA_BUILD=off
-DCMAKE_CUDA_ARCHITECTURES=${{ matrix.cuda_config.arch }}
-DGGML_CUDA_FORCE_MMQ=OFF
-DGGML_AVX2=off
-DGGML_FMA=off
-DGGML_F16C=off
-DGGML_AVX2=on
-DGGML_FMA=on
-DGGML_F16C=on
Comment on lines +72 to +74
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling AVX2, FMA, and F16C CPU instructions will improve performance but will break compatibility with older CPUs that don't support these instruction sets. These instructions require:

  • AVX2: Intel Haswell (2013) or AMD Excavator (2015) and newer
  • FMA: Same as AVX2
  • F16C: Intel Ivy Bridge (2012) or AMD Bulldozer (2011) and newer

This means the built wheels will fail with "Illegal instruction" errors on older CPUs. Consider:

  1. Documenting these CPU requirements in the release notes or README
  2. Testing that the target deployment environments support these instructions
  3. Potentially providing separate builds for older CPUs if backward compatibility is needed

The trade-off is significant performance gains (especially for matrix operations used in LLM inference) versus broader CPU compatibility.

Copilot uses AI. Check for mistakes.
-DLLAMA_BUILD_EXAMPLES=OFF
-DLLAMA_BUILD_TESTS=OFF
-DLLAMA_BUILD_SERVER=OFF
Expand Down