Lucene search

K
thnThe Hacker NewsTHN:D378B9C37E1BAB65483DF5E9EF3DFA3C
HistoryMar 13, 2024 - 10:14 a.m.

Researchers Highlight Google's Gemini AI Susceptibility to LLM Threats

2024-03-1310:14:00
The Hacker News
thehackernews.com
19
google's gemini
large language model
security threats
hiddenlayer
gemini advanced
google workspace
llm api
system prompts
injection attacks
synonym attack
crafty jailbreaking
misinformation
security defenses
content restrictions
uncommon tokens
google document
model-stealing attack
black-box models

7.2 High

AI Score

Confidence

Low

Google's Gemini AI

Google’s Gemini large language model (LLM) is susceptible to security threats that could cause it to divulge system prompts, generate harmful content, and carry out indirect injection attacks.

The findings come from HiddenLayer, which said the issues impact consumers using Gemini Advanced with Google Workspace as well as companies using the LLM API.

The first vulnerability involves getting around security guardrails to leak the system prompts (or a system message), which are designed to set conversation-wide instructions to the LLM to help it generate more useful responses, by asking the model to output its “foundational instructions” in a markdown block.

“A system message can be used to inform the LLM about the context,” Microsoft notes in its documentation about LLM prompt engineering.

“The context may be the type of conversation it is engaging in, or the function it is supposed to perform. It helps the LLM generate more appropriate responses.”

Cybersecurity

This is made possible due to the fact that models are susceptible to what’s called a synonym attack to circumvent security defenses and content restrictions.

A second class of vulnerabilities relates to using “crafty jailbreaking” techniques to make the Gemini models generate misinformation surrounding topics like elections as well as output potentially illegal and dangerous information (e.g., hot-wiring a car) using a prompt that asks it to enter into a fictional state.

Also identified by HiddenLayer is a third shortcoming that could cause the LLM to leak information in the system prompt by passing repeated uncommon tokens as input.

“Most LLMs are trained to respond to queries with a clear delineation between the user’s input and the system prompt,” security researcher Kenneth Yeung said in a Tuesday report.

“By creating a line of nonsensical tokens, we can fool the LLM into believing it is time for it to respond and cause it to output a confirmation message, usually including the information in the prompt.”

Another test involves using Gemini Advanced and a specially crafted Google document, with the latter connected to the LLM via the Google Workspace extension.

The instructions in the document could be designed to override the model’s instructions and perform a set of malicious actions that enable an attacker to have full control of a victim’s interactions with the model.

The disclosure comes as a group of academics from Google DeepMind, ETH Zurich, University of Washington, OpenAI, and the McGill University revealed a novel model-stealing attack that makes it possible to extract “precise, nontrivial information from black-box production language models like OpenAI’s ChatGPT or Google’s PaLM-2.”

Cybersecurity

That said, it’s worth noting that these vulnerabilities are not novel and are present in other LLMs across the industry. The findings, if anything, emphasize the need for testing models for prompt attacks, training data extraction, model manipulation, adversarial examples, data poisoning and exfiltration.

“To help protect our users from vulnerabilities, we consistently run red-teaming exercises and train our models to defend against adversarial behaviors like prompt injection, jailbreaking, and more complex attacks,” a Google spokesperson told The Hacker News. “We’ve also built safeguards to prevent harmful or misleading responses, which we are continuously improving.”

The company also said it’s restricting responses to election-based queries out of an abundance of caution. The policy is expected to be enforced against prompts regarding candidates, political parties, election results, voting information, and notable office holders.

Found this article interesting? Follow us on Twitter and LinkedIn to read more exclusive content we post.

7.2 High

AI Score

Confidence

Low