Vulnerabilities of Large Language Models to Adversarial Attacks

Published in ACL Tutorial, 2024

This tutorial presents the first comprehensive taxonomy of adversarial attacks on large language models (LLMs), including jailbreaks, prompt injection, and emerging multimodal threats. We organize the attack landscape, clarify threat models, and discuss practical evaluation considerations for real-world systems.

The goal is to provide researchers and practitioners with a structured understanding of how LLMs fail under adversarial pressure, what makes these attacks effective, and where current defenses fall short. The tutorial also outlines open challenges and research directions for building more secure and reliable LLM-based applications.

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Pedram Zaree

Share on