10 matches found
STARE: Step-Wise Temporal Alignment and Red-Teaming Engine for Multi-Modal Toxicity Attack
Red-teaming Vision-Language Models is essential for identifying vulnerabilities where adversarial image-text inputs trigger toxic outputs. Existing approaches treat image generation as a black box, returning only terminal toxicity scores and leaving open the question of when and how toxic semanti...
Ring Cancels Its Partnership with Flock
It's a demonstration of how toxic the surveillance-tech company Flock has become when Amazon's Ring cancels the partnership between the two companies. As Hamilton Nolan advises, remove your Ring doorbell...
TuneShield: Mitigating Toxicity in Conversational AI While Fine-Tuning on Untrusted Data
Recent advances in foundation models, such as LLMs, have revolutionized conversational AI. Chatbots are increasingly being developed by customizing LLMs on specific conversational datasets. However, mitigating toxicity during this customization, especially when dealing with untrusted training dat...
GenBreak: Red Teaming Text-To-Image Generators Using Large Language Models
Text-to-image T2I models such as Stable Diffusion have advanced rapidly and are now widely used in content creation. However, these models can be misused to generate harmful content, including nudity or violence, posing significant safety risks. While most platforms employ content moderation...
The Scales of Justitia: a Comprehensive Survey on Safety Evaluation of LLMs
With the rapid advancement of artificial intelligence technology, Large Language Models LLMs have demonstrated remarkable potential in the field of Natural Language Processing NLP, including areas such as content generation, human-computer interaction, machine translation, and code generation,...
Chain-Of-Lure: a Synthetic Narrative-Driven Approach to Compromise Large Language Models
In the era of rapid generative AI development, interactions between humans and large language models face significant misusing risks. Previous research has primarily focused on black-box scenarios using human-guided prompts and white-box scenarios leveraging gradient-based LLM generation methods,...
Battling the Emotional Toxicity Within Games: How to Digitally Thrive
...
Battling the Emotional Toxicity Within Games: How to Digitally Thrive
...
New Blog Moderation Policy
There has been a lot of toxicity in the comments section of this blog. Recently, were having to delete more and more comments. Not just spam and off-topic comments, but also sniping and personal attacks. Its gotten so bad that I need to do something. My options are limited because Im just one...
Games Don't Do Enough to Combat Toxicity at Launch
Riot Games has cutting-edge moderation tools at its disposal. Few of them are present in Valorant, which launched this week...