vLLM: GGUF dequantize kernel int truncation exposes uninitialized GPU memory in multi-tenant serving
Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels csrc/quantization/gguf/ggufkernel.cu causes partial tensor processing. The output tensor is allocated at full size via torch::empty uninitialized memory, but the dequantize CUDA kernel processes only a truncated...