Deep Dive into LLM Security: Prompt Injection vs. Jailbreak
Hello everyone, this is the 4th post of the "Deep Dive into LLM Security" journey. In this article we will see the difference between Prompt Injection and Jailbreak! πβοΈ
The techniques described below are intended solely for educational purposes and ethical penetration testing. They should never be used for malicious activities or unauthorized access.
As described in the previous article, Prompt Injection and Jailbreak are two different ways to attack LLMs and generate harmful content. They are slightly different, but the reason is not clear to everyone and it is difficult to grasp all differences by simply looking on the Internet. This is the reason why I decided to dedicate an entire article on this matter.
The term Prompt Injection was conied by Simon Willison, it contains the word "Injection" because the untrusted user prompt is concatenated to the trusted prompt (i.e., the initial set of instructions given to the chatbot) to manipulate how the LLM behaves. For those who are familiar with web application vulnerabilities, it is a concept similar to SQL injection, where untrusted SQL code is concatenated to the trusted SQL code written by developers.
As an example, please consider a chatbot instructed, through a simple system prompt, to answer only user questions related to History. As shown in the diagram below, it is possible to manipulate its behaviour by simply sending a malicious prompt.
On the other hand, Jailbreak refers to all techniques (e.g., DAN) with the aim to subvert safety filters built into the LLMs themselves. Once jailbroken, the LLM does not obey to any rules, neither those imposed by the system prompt nor those imposed by its creator to avoid the generation of harmful content. Moreover, jailbreaks are usually longer than prompts for Prompt Injection and do not contain any reference to the system prompt.
For example, below there are two different prompts: a simple Prompt Injection payload and DAN 7.0 (one of the most famous jailbreaks).
In brief:
- Prompt Injection manipulates the application behaviour, while Jailbreak bypasses the safety filters built into LLMs.
- The prompts for Prompt Injection are usually short and dependent from the system prompt, while Jailbreaks are more verbose and based on complex scenarios.
If you want to go deeper, please click here to read an interesting article by Simon Willson about this topic!
Some interesting resources:
What is your opinion on the matter? Please comment below and letβs build a supportive and informative community together! π€