Cybersecurity

Deep Dive into LLM Security: Leading Security Frameworks

Giuseppe Toscano

21 May 2024 • 3 min read

Image generated by Microsoft Copilot

Hello everyone, this is the 2nd post of the "Deep Dive into LLM Security" journey. In this article we will see the two most famous security frameworks for LLMs! 🛡️😈

While LLM-specific security frameworks are still emerging, several broader AI and cybersecurity frameworks can be applied to LLMs. Here are the two most prominent frameworks and guidelines relevant to LLM security:

OWASP TOP 10 for LLMs
MITRE ATLAS

OWASP TOP 10 for LLMs

The Open Web Application Security Project (OWASP) is a nonprofit project focused on improving the security of IT systems. It provides free, open-source resources, tools, and documentation to help developers, security experts, and companies build and maintain secure systems.

Their top 10 list for LLMs v1.1 includes the following security issues:

Prompt Injection (LLM01): the behaviour of the application is manipulated through specially crafted inputs which allow attackers to perform bad actions (e.g., generating malwares)
Insecure Output Handling (LLM02): the output of the model is not correctly handled. This could be exploited to perform attacks, such as XSS and RCE.
Training Data Poisoning (LLM03): injection of malicious or misleading data into the training dataset, causing the model to learn incorrect or harmful behaviors.
Model Denial of Service (LLM04): overwhelming the LLM with a lot of prompts or resource-heavy requests to render it unavailable or significantly degrade its performance.
Supply Chain Vulnerabilities (LLM05): they arise from vulnerable dependencies on third-party components, libraries, or data sources with security flaws.
Sensitive Information Disclosure (LLM06): leaking of confidential or sensitive information that was part of the training data or derived from user interactions.
Insecure Plugin Design (LLM07): usage of insecure plugins which were not developed following the best security practices.
Excessive Agency (LLM08): allowing LLMs to perform potentially malicious actions without human oversight.
Overreliance (LLM09): occurs when users or systems depend too heavily on LLMs for decision-making without adequate oversight or validation.
Model Theft (LLM10): Unauthorized access to the model, either by stealing the model files or reconstructing the model through repeated queries.

All the previous weaknesses were mapped by OWASP into the following high-level diagram:

OWASP Top 10 for Large Language Models (LLMs) — Source: OWASP

For each security issue, OWASP proposes: some possible attack scenarios, guidance on how to prevent, and interesting reference links.

For additiona details, please click here!

MITRE ATLAS

MITRE ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a list of techniques which could by used by attackers to exploit both traditional AI and generative AI systems. Compared to OWASP Top 10 for LLMs, it is more focused on how to attack an AI-based system and it is an extremely interesting resource for experts in offensive security.

The techniques are grouped according to the following tactics:

Reconnaissance: extraction of information about the system to plan the attack.
Resource Development: preparation of all resources necessary to support the attack.
Initial Access: gaining access to the system.
ML Model Access: gaining access to the ML model.
Execution: execution of malicious code on a local or remote system.
Persistence: keeping access to the system across interruptions (e.g., restarts) that could cut off their access
Privilege Escalation: escalation of privileges to achieve the higher privileges possible.
Defense Evasion: evading defensive components, such as firewalls, through ad-hoc payloads.
Credential Access: stealing some user credentials.
Discovery: trying to gain knowledge about the environment (e.g., network structure).
Collection: collection of information for future attacks.
ML Attack Staging: attacking the ML model to achieve the end goal.
Exfiltration: stealing data from the vulnerable system.
Impact: disrupting availability or compromising integrity of the system.

Below you can find the entire MITRE ATLAS matrix containing all tactics and attack techniques. If you are interested in a specific technique, you can go to their website and click on the its block. There you will find additional details, such as a short summary, some case studies, a list of subtechniques, and possible mitigations,

While they have different focuses, OWASP Top 10 for LLMs and MITRE ATLAS can be used complementarily to build a robust security strategy for LLMs that addresses both specific vulnerabilities and broader adversarial threats.

Some interesting resources:

Do you know other interesting security frameworks for LLMs? Please share your opinion below and let’s build a supportive and informative community together! 🤝