Overreliance on LLMs
Dealing with incorrect or inappropriate content generated by LLMs
AI/ML
What is overreliance on LLMs?
Overreliance on LLMs (LLM09), according to OWASP (Open Web Application Security Project), represents a vulnerability stemming from an overreliance on Large Language Models (LLMs) within web applications. LLMs are incredibly sophisticated AI systems trained on vast amounts of text data, enabling them to understand and generate human-like text. They're used in various applications, from chatbots to content generation tools, and they're pretty powerful. However, here's the catch: they're not infallible. And if we rely too much on these AI models without double-checking their work, we could be setting ourselves up for trouble.
When developers integrate LLMs into their applications, they often do so with the assumption that these models will reliably produce accurate and appropriate outputs. But that's not always the case. LLMs can sometimes generate misleading, biased, or outright incorrect responses.
Imagine you’re a developer and you ask your IDE to generate a function to create a secure hash. Until now, all the code it has generated has looked pretty good. So, naturally, you get comfortable (and reliant) with this, and with a quick glance (or none at all), this code gets copied and pasted into your application and makes its way into production. However, the code itself has vulnerabilities and is using an insecure hash.
Overreliance on LLMs can create vulnerabilities in several ways. Of course, there's the security aspect. If an application heavily relies on an LLM for authentication or decision-making without proper validation, it opens up avenues for exploitation. Hackers could potentially manipulate the system by feeding it misleading information or tricking it into granting unauthorized access.
Then there's the accuracy issue. LLMs, while impressive, are not perfect. They might misinterpret user inputs or generate responses that don't align with the intended purpose of the application. For instance, an e-commerce platform relying solely on an LLM for product recommendations might suggest entirely irrelevant items to users, leading to frustration and loss of trust. Or worse, it might make up fictitious case citations in a legal brief!
Bias is another significant concern. LLMs learn from the data they're trained on, and if that data contains biases, the models will reflect and potentially amplify those biases in their outputs. This could lead to discriminatory or unfair treatment of users based on factors like race, gender, or socioeconomic status.
Privacy is yet another angle to consider. If an LLM inadvertently generates responses containing sensitive information—like personal details or proprietary data—it could compromise user privacy or leak confidential information, violating privacy regulations and damaging trust in the application. Check out our lesson on insecure plugins for LLMs.
About this lesson
In this lesson, you will learn about vulnerabilities stemming from Overreliance on LLMs. We’ll look at the risks of depending too heavily on these models and learning how to avoid them. We'll talk about what could go wrong, how it might affect us, and what we can do to keep things safe and sound.
There’s a phrase, “garbage in, garbage out” or “rubbish in, rubbish out” depending on where you are geographically located. This is essentially saying that poor input will result in poor output. If you are trying to train something, then bad training will result in bad results.
In the example above, the AI was trained on the company's internal code base. The assumption that was made was that the code was of high quality. Unfortunately, it seems the AI had bad input. What happened was that the AI tool scanned all the code and when it was asked for a function for saving form data, it didn’t go out to the Internet to find the best way to do so, but it instead searched the existing code base for examples and gave a snippet of bad code.
If this was the first query ever asked, the developers might get suspicious from the start about what results can be produced. However, if there were a hundred queries prior that all produced great results, then there’s a chance that this vulnerable code snipper gets implemented as a result of overreliance.
What is the impact of overreliance on LLMs?
Not just AI-generated code, but all code should be reviewed. Without doing so, you can introduce a lot of vulnerabilities into your codebase. And if it isn’t vulnerabilities, then there is a chance that sensitive information could make its way into production code. And if it isn’t any of those, then there is a chance that incorrect information or bias could be introduced! All of these are issues that stem from the overreliance on LLMs.
The issue is that your AI tool isn’t aware of your internal policies. At least not yet. It’s unaware that you don’t use specific packages but when the AI generates code, it may import those packages. Remember something could technically be correct but still insecure.
Ask an AI, “Write a function in Python to encrypt a string using a cipher.” This is what you might get:
Is it correct? Yes. Is it something you want in production to secure your code? No.
Before we talk about the issues associated with the overreliance on LLMs, we can discuss the positive impacts that LLMs have. If you utilize AI tools in your daily life (work or personal), you have probably seen some positive impacts. AI can help solve complex problems, automate certain routine tasks, and provide real-time translations or text-to-speech to improve communication. The recent influx of AI tools has seen a lot of positive outcomes. However, there are some drawbacks. Especially if those drawbacks aren’t known.
In the example above, we looked at AI code generation. This can be seen with a few different products today. However, not all the code generated is secure or correct. According to a recent report, over 56% of developers commonly encounter security issues with AI code suggestions.
Like most work, we need to double-check it. If a developer is relying on AI to generate code, they need to make sure the generated code is correct and secure before putting it into production.
In the Python example above, we could also improve our prompt. We’re still relying on AI, however, we are checking the code first and then prompting again if necessary. This time, ask the AI to “Write a secure function in Python to encrypt a string using a cipher. This should use the latest encryption standards.”
We get a much different result.
Test your knowledge!
Keep learning
Learn more about insecure plugins and other LLM vulnerabilities.
- OWASP slides about their new LLM top 10 list
- A dedicated OWASP page to LLM09, LLM09: Overreliance
- An insecure plugin attack is defined with a CWE as well. More about this can be found at the official MITRE website.
- A Snyk publication on the OWASP LLM top 10 list