9 matches found
CVE-2026-44223
vLLM is an inference and serving engine for large language models LLMs. From to before 0.20.0, the extracthiddenstates speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash ...
CVE-2026-44223
vLLM contains a vulnerability (CVE-2026-44223) where the extract_hidden_states speculative decoding pathway can crash the EngineCore process if any request uses penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). The issue arises from an incorrect tensor shape after t...
CVE-2026-44223 vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
vLLM is an inference and serving engine for large language models LLMs. From to before 0.20.0, the extracthiddenstates speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash ...
vLLM 安全漏洞
vLLM is an open-source LLM-based inference and service engine that features high throughput and efficient memory usage. Versions of vLLM prior to 0.20.0 contained a security vulnerability. This vulnerability stemmed from the extracthiddenstates speculative decoding proposal, which returned tensor...
GHSA-83VM-P52W-F9PW vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Summary The extracthiddenstates speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters...
vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
Summary The extracthiddenstates speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters...
Incorrect Type Conversion or Cast
Overview vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs Affected versions of this package are vulnerable to Incorrect Type Conversion or Cast through the extracthiddenstates speculative decoding. An attacker can cause the server to crash and disrupt servic...
PT-2026-38288
Name of the Vulnerable Software and Affected Versions vLLM versions 0.18.0 through 0.19.1 Description The extract hidden states speculative decoding proposer returns a tensor with an incorrect shape after the first decode step, leading to a RuntimeError that crashes the EngineCore process. This...
Side-Channel Attacks Against LLMs
Here are three papers describing different side-channel attacks against LLMs. "Remote Timing Attacks on Efficient Language Model Inference": Abstract: Scaling up language models has significantly increased their capabilities. But larger models are slower models, and so there is now an extensive...