Conversation
There was a problem hiding this comment.
Pull request overview
This PR enables advanced CPU instruction set extensions (AVX2, FMA, and F16C) in the CUDA build workflow for llama-cpp-python. These optimizations will improve computational performance for matrix operations commonly used in LLM inference, particularly for the CPU-side operations even in CUDA builds.
Changes:
- Enabled AVX2, FMA, and F16C CPU instruction sets in the CUDA build configuration by changing CMake flags from
offtoon
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| -DGGML_AVX2=on | ||
| -DGGML_FMA=on | ||
| -DGGML_F16C=on |
There was a problem hiding this comment.
Enabling AVX2, FMA, and F16C CPU instructions will improve performance but will break compatibility with older CPUs that don't support these instruction sets. These instructions require:
- AVX2: Intel Haswell (2013) or AMD Excavator (2015) and newer
- FMA: Same as AVX2
- F16C: Intel Ivy Bridge (2012) or AMD Bulldozer (2011) and newer
This means the built wheels will fail with "Illegal instruction" errors on older CPUs. Consider:
- Documenting these CPU requirements in the release notes or README
- Testing that the target deployment environments support these instructions
- Potentially providing separate builds for older CPUs if backward compatibility is needed
The trade-off is significant performance gains (especially for matrix operations used in LLM inference) versus broader CPU compatibility.
No description provided.