Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Published in Arxived, 2024

This survey provides a systematic overview of vulnerabilities in large language models (LLMs) revealed through adversarial attacks. We analyze a broad range of attack classes—including jailbreaks, prompt injection, privacy and membership inference, and multimodal exploits—and highlight the underlying mechanisms that make LLMs susceptible to these threats.

By unifying findings across prior work, this paper clarifies common failure modes, attacker assumptions, and evaluation practices, and identifies open challenges for building secure, reliable, and trustworthy LLM systems. The survey aims to serve as a reference for both researchers and practitioners working on LLM safety and security.