2 matches found
PYSEC-2025-53
vLLM is an inference and serving engine for large language models LLMs. Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT Time to First Token. These timing differences...
Timing Attack
Overview vllm is an A high-throughput and memory-efficient inference and serving engine for LLMs Affected versions of this package are vulnerable to Timing Attack due to the PageAttention mechanism. An attacker can observe timing differences to infer details about the processed data by analyzing...