Extract u4/u8 zero point directly instead of FP bias by wine99 · Pull Request #41 · ravi9/llama.cpp

wine99 · 2026-02-05T03:32:14Z

OV uses a u4/u8 zero-point tensor for dequantization, while GGUF models store FP biases. The current code first extracts an FP bias ov::Tensor from the model and then creates a separate zero-point tensor during model conversion. This PR changes the flow so that the zero-point tensor is created directly during model loading.

On top of PR #40

github-actions bot added the ggml label Feb 5, 2026

wine99 requested review from cavusmustafa and ynimmaga February 5, 2026 03:32

Extract zp directly instead of bias

ccf727e

wine99 force-pushed the extract-zp-instead-of-bias branch from 656c43b to ccf727e Compare February 5, 2026 06:37

wine99 merged commit 907d832 into dev_backend_openvino Feb 5, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract u4/u8 zero point directly instead of FP bias#41

Extract u4/u8 zero point directly instead of FP bias#41
wine99 merged 1 commit intodev_backend_openvinofrom
extract-zp-instead-of-bias

wine99 commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wine99 commented Feb 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant