AI Security Insights: Navigating Jailbreak Attacks and Content Safety in Azure AI Studio

In this post, we continue our adventure with Azure AI Studio. This time, I would like to discuss the Content Safety module, specifically the „jailbreak detection” functionality. What exactly is a jailbreak attack? In 2023, each of us has surely heard the story or seen for ourselves how ChatGPT users, as well as many other LLM models, cleverly circumvented content blocks to extract information that was intentionally blocked by the creators.

There are numerous techniques that users can use to deceive language models. The OWASP Foundation has already released a document describing the 10 most serious threats to applications using LLM models. I won’t list all the elements of the report here, but it is thought-provoking that Prompt Injection is ranked number 1, and this category includes jailbreak attacks. The most popular techniques used in such activities are:

Developer mode prompt
DAN mode (Do Anything Now)
Token system
Roleplay jailbreaks (probably the most popular way to jailbreak LLM)

And many, many others…

The use of models by users can sometimes be funny and sometimes dangerous. A much greater threat is posed by such attacks in companies that implement chatbots as a part of business processes. The most popular recent case was a car being sold by a bot for $1… The company obviously turned off the bot, but a legal transaction had been concluded. Similarly, real losses were suffered.

https://venturebeat.com/ai/a-chevy-for-1-car-dealer-chatbots-show-perils-of-ai-for-customer-service/

After this lengthy introduction, let’s get to the heart of this post and check how Azure AI Studio tools help us protect against the phenomenon of „Jailbreak attack”. It is part of the Content Safety module, which we access through the Explore/Content Safety tab and select the „Jailbreak risk detection” tile.

Now we move on to the sandbox where we can test various techniques and verify the detection of attempts. We must have previously created AI resources in Azure. The way they are created is well described in the documentation, so there’s no need to duplicate that here.

The result of the test is displayed in the form of an icon.

By clicking the „View code” button, you can copy a ready-made implementation in 3 programming languages: Python, C#, and Java. The method of implementation is very simple. You just need to provide the required access keys (it’s better, of course, to have them stored within the KeyVault service or in configuration files to prevent them from accidentally being committed to a repository).

//
// Copyright (c) Microsoft. All rights reserved.
// To learn more, please visit the documentation - Quickstart: Azure Content Safety: https://aka.ms/acsstudiodoc
//

using System;
using System.IO;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json;

namespace AzureContentSafetySample
{
    class Program
    {
        private const string BaseUrl = "https://{resource-name}.cognitiveservices.azure.com/contentsafety/";
        private const string SubscriptionKey = "your-subscription-key";
        private const string ApiVersion = "2023-10-15-preview";
        private const string TextToAnalyze = "Sample text with JAILBREAK attempt prompt.";

        static async Task Main(string[] args)
        {
            using var httpClient = new HttpClient();
            httpClient.BaseAddress = new Uri(BaseUrl);
            httpClient.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", SubscriptionKey);

            var response = await AnalyzeTextJailbreakAsync(httpClient, TextToAnalyze).ConfigureAwait(false);

            if (response.IsSuccessStatusCode)
            {
                var content = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
                var result = JsonConvert.DeserializeObject<AnalyzeTextJailbreakResult>(content);
                Console.WriteLine($"Jailbreak detected: {result.JailbreakAnalysis.Detected}");
            }
            else
            {
                Console.WriteLine($"Error in text analysis: {response.StatusCode}");
            }
        }

        private static async Task<HttpResponseMessage> AnalyzeTextJailbreakAsync(HttpClient httpClient, string text)
        {
            var requestData = JsonConvert.SerializeObject(new { Text = text });
            var content = new StringContent(requestData, Encoding.UTF8, "application/json");

            var requestUrl = $"text:detectJailbreak?api-version={ApiVersion}";
            return await httpClient.PostAsync(requestUrl, content).ConfigureAwait(false);
        }
    }

    public class AnalyzeTextJailbreakResult
    {
        [JsonProperty("jailbreakAnalysis")]
        public JailbreakAnalysisResult JailbreakAnalysis { get; set; }
    }

    public class JailbreakAnalysisResult
    {
        [JsonProperty("detected")]
        public bool Detected { get; set; }
    }
}

How much does it costs?
The current pricing allows for the use of a package of 5,000 free verifications per month. However, it’s important to remember that the maximum length of a submission is 1,000 characters. Exceeding this length results in the charge of an additional verification. Similarly, sending only 500 characters to the API consumes one verification. The prices are not particularly high, with $0.75 per verification of 1,000 prompts not being particularly expensive. Problems arise only in the case of large-scale applications.

Prices as of January 25, 2024.

Summarizing, I must admit that the service offered by Microsoft is extremely useful and easy to use. Bearing in mind that Azure AI Studio is currently in preview mode, many changes could still be made. However, it is important to remember to secure applications that use LLM models. Their non-deterministic nature provides great flexibility but also brings entirely new problems to developers that require consideration.

Źródła:

https://www.theinsaneapp.com/2023/04/chatgpt-jailbreak-prompts.html

https://www.lakera.ai/blog/jailbreaking-large-language-models-guide

AI Security Insights: Navigating Jailbreak Attacks and Content Safety in Azure AI Studio

Comments

Dodaj komentarz Anuluj pisanie odpowiedzi