Improper output handling in LLMs
Even your LLMs need to sanitize data!
~15mins estimatedAI/ML
What is improper output handling in LLMs?
Improper output handling is a part of the OWASP (Open Worldwide Application Security Project) guide for Large Language Models (LLMs). It is an item in the OWASP Top Ten for LLMs, which lists the most critical security risks associated with using large language models in applications.
Insecure output handling occurs when the outputs generated by a large language model are not properly managed, sanitized, or validated before being used in the application. This can lead to various security risks, such as injection attacks, data leaks, or the propagation of harmful content.
For example, let’s say you have a virtual assistant at home, like a smart speaker that uses a powerful AI to understand and respond to your requests. You ask it to generate an email for your boss, and it writes something based on the information it has. If you send that email without reviewing it, the message could have errors, include inappropriate content, or even contain sensitive information that shouldn’t be shared. This lack of review and validation is a form of insecure output handling.
About this lesson
In this lesson, you will learn about improper output handling in LLMs and how this can create certain risks. We’ll look at each of those risks and give an example of how this vulnerability can be seen in an application. Finally, we’ll fix things up and examine how insecure output handling can be mitigated.
The chatbot above uses an LLM to generate responses to user queries. The backend of this application processes user inputs, sends them to the LLM, and then directly uses the LLM's outputs in the web application without proper validation or sanitization.
We have seen this before in other lessons and vulnerabilities with SQL injection and improper input validation. But what exactly is happening? Let’s take a look at the code:
The LLM output (bot_output) is directly embedded into the HTML response. This means any content generated by the LLM is immediately rendered in the user's browser without any checks. This brings us the the lack of sanitization.
The user_input and bot_output are not sanitized. This means any malicious content, such as HTML tags or JavaScript code, will be executed as part of the web page. And since the content is directly inserted into the HTML, it opens up the possibility of HTML injection. If a user input or LLM output contains <script> tags or other harmful HTML elements, these will be rendered and executed by the browser.
What is the impact of insecure output handling?
If you let your imagination run, you can probably imagine many different scenarios in which insecure output handling could cause problems. Without sanitizing the output, the LLM could generate harmful or inappropriate content that will be displayed to the user as part of the chatbot’s response.
Expanding on that, there is the risk of data leaks! This one is a little harder to fight against. Sensitive information could be inadvertently included in the LLM’s output and displayed to the user. Sanitizing data won’t help here. If your LLM is being manipulated via prompt injection, then you’ll need to look at different mitigation techniques.
We looked under the hood at the code and explained which lines are vulnerable, but why is this happening in the first place?
User input can be problematic, no matter what the application. LLMs process and generate text based on user inputs, which are inherently unpredictable. Ensuring that every possible output scenario is safe requires robust sanitization mechanisms, which are often overlooked.
Developers may also mistakenly trust the output from an LLM, assuming it to be safe. However, LLMs generate text based on patterns in data and can produce unexpected or harmful content if not properly controlled. Just like we can’t predict user input, we also can’t predict the exact LLM output.
Let’s update the code above to include some sanitization to help mitigate the risk.
Test your knowledge!
Keep learning
Learn more about improper output handling and other LLM vulnerabilities.
- OWASP website for their new LLM top 10 list
- A dedicated OWASP page to improper output handling
- A Snyk publication on the OWASP LLM top 10 list