-
-
Notifications
You must be signed in to change notification settings - Fork 12.5k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[CI/Build] Separate out flaky responses API tests
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
rocm
Related to AMD ROCm
#32110
opened Jan 11, 2026 by
DarkLight1337
Loading…
5 tasks
[V1][Spec Decode] Fix outdated RejectionSampler docstring
v1
#32107
opened Jan 11, 2026 by
WineChord
Loading…
[Perf] Optimize Context Parallel by disable NCCL_GRAPH_MIXING_SUPPORT
nvidia
#32106
opened Jan 11, 2026 by
FENP
Loading…
5 tasks
Add tensor IPC transfer mechanism for multimodal data
frontend
multi-modality
Related to multi-modality (#4194)
v1
#32104
opened Jan 11, 2026 by
brandonpelfrey
Loading…
1 task
[ROCm][Bugfix] Fix Mamba batched decode producing incorrect output
rocm
Related to AMD ROCm
#32099
opened Jan 10, 2026 by
AndreasKaratzas
Loading…
fix(examples): replace unsafe eval() with safe math evaluator in xLAM tool examples
documentation
Improvements or additions to documentation
#32098
opened Jan 10, 2026 by
deosha
Loading…
[Bugfix] Fix GLM-4.7 tool parser for tool call without arguments
#32097
opened Jan 10, 2026 by
steinfurt
Loading…
3 of 5 tasks
[cpu][bench] Add Fused MoE Micro Benchmark for CPU Backend
cpu
Related to CPU backends
performance
Performance-related issues
#32092
opened Jan 10, 2026 by
andikarachman
Loading…
3 of 5 tasks
[Tracing] Support OTEL_TRACES_EXPORTER env var and multiple exporters
ci/build
v1
#32091
opened Jan 10, 2026 by
minimAluminiumalism
Loading…
2 of 5 tasks
fix offline inference chat response prompt
documentation
Improvements or additions to documentation
speculative-decoding
#32088
opened Jan 10, 2026 by
andyxning
Loading…
5 tasks
refactor: refactor_repeated_interfaces
deepseek
Related to DeepSeek models
#32087
opened Jan 10, 2026 by
tom-zju
Loading…
5 tasks
[Model] Improve multimodal pooling examples
documentation
Improvements or additions to documentation
[Bugfix] Fix ModelOpt Llama-4 slow loading via tensor contiguity
llama
Related to Llama models
#32081
opened Jan 10, 2026 by
ishrith-gowda
Loading…
Add support for compressed-tensors NVFP4 in non-gated MoE layers #31782
#32080
opened Jan 10, 2026 by
baonudesifeizhai
Loading…
5 tasks
[EPLB] Replace async handshake flags with TransferPhase state machine
#32078
opened Jan 10, 2026 by
Anri-Lombard
Loading…
2 tasks done
[Cleanup] Removed unused ONLY add when PR is ready to merge/full CI is needed
v1
KVConnectorModelRunnerMixin methods
kv-connector
ready
#32077
opened Jan 10, 2026 by
njhill
Loading…
[Attention][4/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties
speculative-decoding
v1
#32073
opened Jan 10, 2026 by
LucasWilkinson
•
Draft
5 tasks
Use inference_mode() for torchao weight quantization
ci/build
#32071
opened Jan 10, 2026 by
jerryzh168
Loading…
[RFC] Improve environment variable declaration and handling (#31249)
documentation
Improvements or additions to documentation
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.