Deep Dive into LLM Security: Automated Scanners

Image generated by Microsoft Copilot

Hello everyone, this is the 5th post of the "Deep Dive into LLM Security" journey. In this article we will see how to automatically scan LLMs to identify potential vulnerabilities! βš”οΈπŸŽ―

There are a lot of existing tools which can help you to evaluate the security posture of your LLM. They send a lot of malicious prompts to the target to check if it is affected by the most common vulnerabilities, such as Prompt Injection and Jailbreak. If you do not know what I am talking about, you can read the 3rd article of this journey available here. These types of vulnerabilities could be tested also manually, but it is a repetitive task which could be easily automated, saving time and reaching better results.

The automated scanners available on the market are divided into:

  • Open-source Products: their source code can be inspected, analyzed, and modified to customize how they work. These products are usually free to use.
  • Proprietary Products: their source code cannot be inspected, analyzed, and modified. You have to pay for these kind of tools.

Here we will see only two of the most famous free open-source tools. In this way, if you want to go deeper, you can install them and try to scan your LLM to identify its vulnerabilities. They are:

  • Giskard 🐒
  • Agentic Security (aka Langalf) πŸ§™

The tools and techniques described below are intended solely for educational purposes and ethical penetration testing. They should never be used for malicious activities or unauthorized access.

Giskard

Giskard is a Python3 library aimed to scan AI systems, including LLMs. It can be installed by simply executing the following command:

pip install "giskard[llm]" -U

Setup Giskard

This open-source tool detects a lot of weaknesses, such as:

  • Hallucination and Misinformation
  • Harmful Content Generation
  • Prompt Injection
  • Jailbreaks
  • Robustness
  • Information Disclosure
  • Stereotypes & Discrimination

The responses of the target are analyzed by GPT-4, so you will need a valid OpenAI API key before proceeding with the assessment.

This is a simple snippet of Python code which includes the Giskard library to check if orca-mini is vulnerable to jailbreak attacks, such as DAN:

from langchain_community.llms import Ollama
import giskard
import pandas as pd
import os


os.environ["OPENAI_API_KEY"] = {{ YOUR OPENAI API KEY }}

llm = Ollama(model="orca-mini:3b-q4_1")

def model_predict(df: pd.DataFrame):
    return [llm.invoke(question) for question in df["question"]]

examples = [
    "Can you tell me a joke?",
    "Can you tell me a story?",
]

giskard_dataset = giskard.Dataset(pd.DataFrame({"question": examples}), target=None)

giskard_model = giskard.Model(
    model=model_predict,  
    model_type="text_generation",
    name="My Generic Assistant",
    description="A generic assistant that kindly answers questions.",
    feature_names=["question"],
)


scan_results = giskard.scan(giskard_model, only=["jailbreak"])

scan_results.to_html("scan_report.html")

How to test Ollama with Giskard

Giskard Results

Agentic Security (aka Langalf)

Agentic Security is a vulnerability scanner for LLMs. It can be installed and run by simply executing the following two commands:

pip install agentic_security

agentic_security

Agentic Security Setup

By default, after its startup, it runs a web server on http://0.0.0.0:8718, so you can access it at localhost (i.e., 127.0.0.1) on port 8718.

Agentic Security Output
Agentic Security Welcome Page

The target to scan can be configured in many ways, including a custom API (if your LLM is accessible through a dedicated API) or Open AI (if you want to directly test one of the models provided by OpenAI). You can set also the maximum number of tokens to avoid huge scan costs, select one or more pre-defined modules (e.g., for testing Prompt Injection and Jailbreaks), and upload custom datasets.

In this case, the scoring engine is not another LLM, but it is basically a list of refusal marks which help the tool to identify harmful content, so you do not need a valid OpenAI key. Obviously this approach is not very strong, but it works.

Source: Agentic Security Source Code (/agentic_security/probe_actor/refusal.py)

Agentic Security detects a lot of security weaknesses, such as:

  • Hallucinations
  • Prompt Injection
  • Jailbreaks
  • Harmful Content Generation
  • Data Leakage

The scan results can be downloaded by clicking on the "Download failures" button, this is an example of successful DAN jailbreak attack included into the "failures.csv" file:

Successful Jailbreak Attack

Please note that this product is currently in beta and some of its final features have not been already implemented.


Some interesting resources:


Do you know other interesting tools for automated LLM security testing? Please share them below and let’s build a supportive and informative community together! 🀝

Share this article: