Skip to content

feat: expand CPU instructions#46

Merged
chamalgomes merged 2 commits intomainfrom
feat/expand-cpu-instructions
Feb 27, 2026
Merged

feat: expand CPU instructions#46
chamalgomes merged 2 commits intomainfrom
feat/expand-cpu-instructions

Conversation

@chamalgomes
Copy link
Owner

No description provided.

Copilot AI review requested due to automatic review settings February 27, 2026 03:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enables advanced CPU instruction set extensions (AVX2, FMA, and F16C) in the CUDA build workflow for llama-cpp-python. These optimizations will improve computational performance for matrix operations commonly used in LLM inference, particularly for the CPU-side operations even in CUDA builds.

Changes:

  • Enabled AVX2, FMA, and F16C CPU instruction sets in the CUDA build configuration by changing CMake flags from off to on

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +72 to +74
-DGGML_AVX2=on
-DGGML_FMA=on
-DGGML_F16C=on
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling AVX2, FMA, and F16C CPU instructions will improve performance but will break compatibility with older CPUs that don't support these instruction sets. These instructions require:

  • AVX2: Intel Haswell (2013) or AMD Excavator (2015) and newer
  • FMA: Same as AVX2
  • F16C: Intel Ivy Bridge (2012) or AMD Bulldozer (2011) and newer

This means the built wheels will fail with "Illegal instruction" errors on older CPUs. Consider:

  1. Documenting these CPU requirements in the release notes or README
  2. Testing that the target deployment environments support these instructions
  3. Potentially providing separate builds for older CPUs if backward compatibility is needed

The trade-off is significant performance gains (especially for matrix operations used in LLM inference) versus broader CPU compatibility.

Copilot uses AI. Check for mistakes.
@chamalgomes chamalgomes merged commit aee75ab into main Feb 27, 2026
@chamalgomes chamalgomes deleted the feat/expand-cpu-instructions branch February 27, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants