8 matches found
PT-2026-46251
A flaw has been found in LMCache up to 0.4.6. This affects the function hex hash to int16 of the file lmcache/integration/vllm/utils.py of the component KV Cache Handler. Executing a manipulation can lead to use of weak hash. The attack needs to be launched locally. The attack requires a high lev...
kv-cache-side-channel-poc
KV Cache Side-Channel: Cross-Tenant Timing Oracle Proof of co...
CacheTrap: Injecting Trojans in LLMs without Leaving Any Traces in Inputs or Weights
Adversarial weight perturbation has emerged as a concerning threat to LLMs that either use training privileges or system-level access to inject adversarial corruption in model weights. With the emergence of innovative defensive solutions that place system- and algorithm-level checks and correctio...
GHSA-7XCV-9J6C-2FMC Modular Max Serve has Unsafe Deserialization vulnerability
Unsafe Deserialization vulnerability in Modular Max Serve before 25.6, specifically when the "--experimental-enable-kvcache-agent" feature is used allowing attackers to execute arbitrary code...
Whose Narrative Is It Anyway? A KV Cache Manipulation Attack
The Key ValueKV cache is an important component for efficient inference in autoregressive Large Language Models LLMs, but its role as a representation of the model's internal state makes it a potential target for integrity attacks. This paper introduces "History Swapping," a novel block-level...
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-Cache in LLM Inference
The Key-Value KV cache, which stores intermediate attention computations Key and Value pairs to avoid redundant calculations, is a fundamental mechanism for accelerating Large Language Model LLM inference. However, this efficiency optimization introduces significant yet underexplored privacy risk...
Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
Global KV-cache sharing has emerged as a key optimization for accelerating large language model LLM inference. However, it exposes a new class of timing side-channel attacks, enabling adversaries to infer sensitive user inputs via shared cache entries. Existing defenses, such as per-user isolatio...
CachePrune: Neural-Based Attribution Defense against Indirect Prompt Injection Attacks
Large Language Models LLMs are identified as being susceptible to indirect prompt injection attack, where the model undesirably deviates from user-provided instructions by executing tasks injected in the prompt context. This vulnerability stems from LLMs' inability to distinguish between data and...