PYSEC-2025-53
vLLM is an inference and serving engine for large language models LLMs. Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT Time to First Token. These timing differences...