12 matches found
EUVD-2026-30303
PyTorch Lightning is a deep learning framework to pretrain and finetune AI models. Versions 2.6.2 and 2.6.2 have introduced functionality consistent with a credential harvesting mechanism...
Reconstruction of Personally Identifiable Information from Supervised Finetuned Models
Supervised Finetuning SFT has become one of the primary methods for adapting a large language model LLM with extensive pre-trained knowledge to domain-specific, instruction-following tasks. SFT datasets, composed of instruction-response pairs, often include user-provided information that may...
Corrupting LLMs Through Weird Generalizations
Fascinating research: Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs. Abstract LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside...
Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models
Whitepaper from researchers at MIT, Northeastern University, and Meta. For an LLM to correctly respond to an instruction it must understand both the semantics and the domain i.e., subject area of a given task-instruction pair. However, syntax can also convey implicit information Recent work shows...
Tab-MIA: a Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Large language models LLMs are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information PII in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the...
When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-Box Setting
Watermarking has emerged as a promising solution to counter harmful or deceptive AI-generated content by embedding hidden identifiers that trace content origins. However, the robustness of current watermarking techniques is still largely unexplored, raising critical questions about their...
EditLord: Learning Code Transformation Rules for Code Editing
Code editing is a foundational task in software development, where its effectiveness depends on whether it introduces desired code property changes without changing the original code's intended functionality. Existing approaches often formulate code editing as an implicit end-to-end task, omittin...
Finetuning-Activated Backdoors in LLMs
Finetuning openly accessible Large Language Models LLMs has become standard practice for achieving task-specific performance improvements. Until now, finetuning has been regarded as a controlled and secure process in which training on benign datasets led to predictable behaviors. In this paper, w...
Set You Straight: Auto-Steering Denoising Trajectories to Sidestep Unwanted Concepts
Ensuring the ethical deployment of text-to-image models requires effective techniques to prevent the generation of harmful or inappropriate content. While concept erasure methods offer a promising solution, existing finetuning-based approaches suffer from notable limitations. Anchor-free methods...
ArtistAuditor: Auditing Artist Style Pirate in Text-To-Image Generation Models
Text-to-image models based on diffusion processes, such as DALL-E, Stable Diffusion, and Midjourney, are capable of transforming texts into detailed images and have widespread applications in art and design. As such, amateur users can easily imitate professional-level paintings by collecting an...
“Emergent Misalignment” in LLMs
Interesting research: "Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs": Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model act...
CVE-2024-45855
Deserialization of untrusted data can occur in versions 23.10.2.0 and newer of the MindsDB platform, enabling a maliciously uploaded ‘inhouse’ model to run arbitrary code on the server when using ‘finetune’ on it...