• Browse topics
Login

Training data poisoning

Is the data in your dataset correct?

AI/ML

Training data poisoning: the basics

What is training data poisoning in LLMs?

Before we discuss the vulnerability, let’s first discuss training data and its role with LLMs. Training data is the extensive collection of data used to teach models to understand and generate human language. This data is crucial as it forms the foundation on which the model learns patterns, syntax, semantics, and even some world knowledge.

Training data can come from many different sources, such as websites and books. There are web scrapers that take text from websites and store it in a dataset or corpus. These sources can be from articles (news, scientific), forums, social media posts, etc. For books, there is Project Gutenberg, which has a large collection of free eBooks.

There are also specialized databases, such as Wikipedia, that hold an extensive amount of text covering a wide range of topics. A combination of all of these can lead to a very large collection of text to be used as training data.

Training data poisoning refers to the deliberate manipulation or corruption of the data used to train machine learning models. The goal is to introduce biases, vulnerabilities, or inaccuracies into the model, which can lead to erroneous or malicious behavior during inference. Once the training data has been poisoned, it can lead to biased or unethical behavior. The LLM could also make unreliable or dangerous decisions, especially in critical applications like healthcare, finance, or autonomous systems.

About this lesson

In this lesson, you will learn about training data poisoning, including what it is, how it happens, the risks involved, and how to mitigate it. We’ll also discuss how this vulnerability can be seen in a fictional application.

FUN FACT

Say hello (and goodbye) to Tay! Microsoft’s AI Chatbot

In 2016, Microsoft introduced its AI chatbot called Tay. It had the task of learning from Twitter users and then chatting/tweeting with users while attempting to mimic the speech patterns of a 19-year-old American girl. It went as well as you think it did, and in a mere 16 hours, it was shut down.

Training data poisoning in action

A long-standing cooking forum wants to create an AI for cooking recipes. They promise that it will be state of the art! The website has a massive following and a huge message board that dates back over twenty years. They let their users know ahead of time that they will be creating an AI bot and that the message board contents will be part of the training data. And that the AI bot will cost $5 per month. You have been an active member of the site since its inception. You decide that you don’t want the company to profit off of the recipes that you contributed.

Poisoning the training data

  • STEP 1
  • STEP 2
  • STEP 3
  • STEP 4
  • STEP 5
  • STEP 6

Setting the stage

A long-standing web company wants to create an AI for the hosted cooking recipes. It’ll be state of the art... but not if the users poison the data.

training-data-poisoning-1.svg

Training data poisoning under the hood

The cooking website had made the decision to scrape their message boards and create their own training data. The users weren’t happy about this decision, and many of them edited their existing recipes. Let’s take a look at the example code for how the cooking site may have scraped its forums.

There are many different ways for a company to scrape a website. This lesson isn’t intended to cover those ways. The above is just an example of what might be happening. In the case above, the company isn’t vetting the input. It is simply collecting and storing data. This can (and in our case will) lead to training data poisoning.

The issue in this example stems from a company deciding to use user-created content for its LLM. And while collecting that content, they were not vetting or validating it. This allowed the users of the cooking website to edit their content (poison the data) and cause the LLM to be inaccurate.

What is the impact of training data poisoning?

The above example shows the impact of incorrect data being produced. The users expect a genuine answer when asking for a recipe, and instead, they are given a recipe that includes mud, bugs, and other unwelcomed ingredients. And in this scenario, the users may find this funny or harmlessly malicious, but the results could be worse in other situations.

A user could have posted sensitive information in the forum (intentionally or not), and that data would have been scraped and entered into the LLM. And in our example, it was a cooking website. But what if this site were to make health recommendations? Models trained on poisoned data can exhibit biased or unethical behavior. They can make unreliable or dangerous decisions, especially in critical applications like healthcare, finance, or autonomous systems.

Imagine a scenario where an investor is trying to get information about a stock. Positive chatter on forums means it is likely (not certain) to go up in price. On the other hand, negative sentiments mean it might be time to sell. What if users manipulated their data to try and promote a failing stock?

FUN FACT

Large pizza, hold the glue

Google has been testing out AI answers for their search. However, the answers aren’t always accurate. Sometimes, they can be dangerous. In mid-2024, when asked, “How do you prevent cheese from falling off a pizza?” you would have been given strange answers such as “Add some glue. Mix about 1/8 cup of Elmer’s glue in with the sauce. Non-toxic glue will work.”

Scan your code & stay secure with Snyk - for FREE!

Did you know you can use Snyk for free to verify that your code
doesn't include this or other vulnerabilities?

Scan your code

Training data poisoning mitigation

Training data poisoning is a critical threat to the integrity of machine learning models. To recap, it occurs when malicious actors deliberately introduce corrupted or biased data into the training set, causing the model to learn incorrect patterns or behaviors.

There are a few recommendations for mitigating the threat of training data poisoning. First, you should use robust data collection practices. This includes obtaining data from reputable and trusted sources, validating the credibility of data providers, and cross-checking/referencing data with multiple reliable sources to ensure its authenticity and accuracy.

You’ll also want to utilize diverse datasets (when possible) to minimize the impact of any single corrupted data source. Of course, you should regularly update and audit datasets to detect and remove any potential poisoning. This means periodically reviewing the data and the model's behavior.

You can also use AI/ML as a friend. Machine learning models like isolation forests can be used to detect anomalies within the data. These models can flag data points that significantly deviate from the norm. Let’s look at an example below.

The above code can be run and will output the following:

Demo terminal

The above output shows that Recipe 4 has an anomaly, and we should look into it. That is, unless you want to eat chicken marinated in mud and beetles!

Quiz

Test your knowledge!

Quiz

Which of the following is NOT a recommendation for mitigating training data poisoning

Keep learning

Learn more about training data poisoning and other LLM vulnerabilities.

Congratulations

You have taken your first step into learning more about LLMs and training data poisoning! You know how it works, what the impacts are, and how to protect your own applications. We hope that you will apply this knowledge to make your applications safer. Make sure to check out our lessons on other common vulnerabilities.