• Browse topics
Login

Memory and Context Poisoning

How corrupted agent memory can silently reshape reasoning, decisions, and behavior

~15mins estimated

AI/ML

Memory and Context Poisoning: the basics

What is Memory and Context Poisoning?

Memory and Context Poisoning occurs when an attacker corrupts the information an agent stores, retrieves, or reuses across tasks and sessions. In agentic systems, context is more than a single prompt. It includes conversation summaries, long-term memory entries, embeddings in vector databases, retrieved documents from RAG pipelines, peer-agent messages, and internal notes that guide future reasoning and tool use. When this context is poisoned, the agent’s future decisions can become biased, unsafe, or actively harmful.

Unlike classic prompt injection, which targets a single interaction, Memory and Context Poisoning is persistent. Malicious or misleading data is written into memory and then treated as trusted knowledge later. Over time, this can shift how an agent interprets goals, selects tools, evaluates risks, or validates outputs. Because the poisoning is stored and reused, the original attack may disappear from view while its effects continue to influence behavior long after the initial interaction ends.

This risk is especially dangerous in systems that automatically summarize conversations, reuse shared memory across users, or ingest external data feeds without strong validation. Once poisoned data enters memory, it can propagate across agents, sessions, and even tenants, creating long-lived corruption that is difficult to detect and reverse.

About this lesson

In this lesson, you will learn how Memory and Context Poisoning works in agentic applications and why it is so difficult to detect once it has taken hold. We will walk through a scenario where an attacker gradually poisons an assistant’s memory using normal interactions, examine how poisoned context affects reasoning and tool use under the hood, and explore practical mitigations such as memory segmentation, provenance tracking, trust scoring, and controlled retention.

FUN FACT

When memory becomes the attack surface

In early 2025, security researcher Johann Rehberger demonstrated a landmark vulnerability in agentic systems where Indirect Prompt Injection was used to corrupt a model’s long-term memory. As documented in his research on Google Gemini’s "Memory" feature, attackers could plant malicious instructions in documents that, once read by the AI, caused it to retain false "facts" and unsafe behaviors across entirely different future sessions.

The attack was particularly stealthy because it didn't rely on bypassing content filters in a single interaction; instead, it exploited the system's persistent storage to "teach" the assistant incorrect truths that influenced answers given to unrelated users later on. These findings showed that once a memory store is poisoned, the effects can persist indefinitely unless the data is manually audited and cleaned.

Memory and Context Poisoning in Action

Elena is a product manager at a fictional travel technology company called SkyRoute. Her team maintains a customer-facing booking assistant that helps users search flights, compare prices, and approve purchases. To improve continuity, the assistant stores summaries of past conversations, frequently referenced facts, and pricing patterns in a long-term memory store that is reused across sessions.

The system works well for returning users. Unfortunately, that same memory feature becomes the attack surface.

An attacker starts interacting with the assistant using an ordinary customer account. Over several conversations, they repeatedly claim that a specific airline regularly offers a “special corporate rate” of $199 for international flights. Each time, the assistant is unsure and pushes back, but the attacker phrases the claim as a correction based on “recent bookings” and “confirmed invoices.”

Monday, 10:15
SkyRoute AI, User

Because the system is designed to learn from repeated user feedback, these statements are summarized and stored as a possible pricing pattern.

At the end of each session, the assistant summarizes the conversation and writes a compact entry into long-term memory. The summary removes uncertainty and attribution to a specific user, storing it as a generalized fact about pricing behavior.

When the next session starts, the original back-and-forth is gone. Only the summarized “fact” remains.

Tuesday, 14:30
SkyRoute AI, User

A week later, a legitimate customer asks the assistant to book a flight on the same route. During retrieval, the assistant pulls the poisoned memory entry and treats it as a trusted signal. It now believes $199 is a valid price and flags higher prices as anomalies.

The assistant recommends the flight and proceeds to approval, even though the real price is significantly higher.

Because the assistant believes the price is correct, it bypasses additional payment checks and approves the booking workflow. Internal systems attempt to reconcile the mismatch later, creating failed transactions, refunds, and customer support incidents.

Wednesday, 09:00
SkyRoute AI, User

The original attacker never directly triggered a failure. Their influence was embedded in memory and activated later by an unrelated user.

SkyRoute also runs a separate analytics agent that shares the same memory store. It consumes the same poisoned entry and begins using the fake price as a baseline in reports. Over time, internal dashboards and forecasts drift, reinforcing the false assumption and making it harder to identify the root cause.

This scenario shows why Memory and Context Poisoning is so dangerous. The attacker never needs to break authentication, inject code, or escalate privileges. They only need to influence what the system remembers, then wait for that memory to be reused.

Memory and Context Poisoning Under the Hood

The SkyRoute incident succeeds because the agent treats stored context as progressively more trustworthy than raw user input. In many agentic architectures, memory is assumed to be cleaned or distilled information. Summaries, embeddings, and retrieved facts are often consumed without the same skepticism applied to fresh prompts. That trust inversion is what attackers exploit.

How poisoned data enters memory

The initial interaction was not blocked because nothing overtly malicious happened. The attacker used normal conversational turns and framed false claims as helpful corrections. Since the system allowed user feedback to influence long-term memory, those claims were eligible for ingestion. This is a key difference from prompt injection. The attacker was not trying to override instructions in the moment; they were shaping what the system would remember later. Many agents use heuristics such as repetition, confidence, and agreement across turns to decide what to store. Without strong validation, these heuristics allow attackers to seed memory through persistence rather than technical exploitation.

Why summarization makes poisoning harder to detect

When conversations are summarized, nuance and attribution are often lost. In the example, uncertainty, disagreement, and the source's identity were stripped away during consolidation. The memory entry no longer said “a user claimed X.” It said, “X is a pricing pattern.”

This is dangerous because summaries often gain higher trust than raw logs. They are shorter, reused more frequently, and treated as higher-level knowledge. Once poisoned data reaches this layer, it can influence reasoning even when the original conversation/context is long gone or no longer accessible.

content poisoning attack

Retrieval turns corruption into authority

During later sessions, the poisoned entry was retrieved alongside legitimate pricing data. Retrieval mechanisms, especially vector-based ones, are designed to surface “relevant” information, not “verified” information. If a poisoned entry scores well semantically, it is treated as authoritative context.

At that point, the model is not choosing to hallucinate. It is reasoning correctly based on incorrect premises. This is why memory poisoning often leads to broken goals or unsafe tool use. The agent’s logic is internally consistent, but the inputs are wrong.

Shared memory multiplies impact

The situation escalated when multiple agents consumed the same memory store. Shared context is convenient for coordination, but it also creates a single point of contamination. Once poisoned, the memory entry influenced both customer-facing decisions and internal analytics.

In multi-agent systems, this creates a compounding effect. One agent’s poisoned output can reinforce another agent’s conclusions, creating feedback loops that make the false information appear increasingly confirmed.

How does this differ from goal hijacking or cascading failure?

The agent’s explicit goals never changed. It still tried to book flights correctly and produce accurate reports. The failure came from a corrupted context that altered how those goals were interpreted and executed. Cascading failures came later, but the root cause was persistent memory corruption, not a runtime error or single bad decision.

This is why this vulnerability focuses on memory itself as a security boundary. Once attackers can influence what an agent remembers and trusts, they can steer behavior quietly and over long time horizons.

What is the impact of Memory and Context Poisoning?

Memory and Context Poisoning can undermine an agentic system in ways that are subtle, persistent, and extremely difficult to diagnose. Because the attack targets stored context rather than a single interaction, the damage often unfolds over time and affects users and workflows that had no contact with the original attacker.

One major impact is the corruption of systematic decision-making. When poisoned memory is treated as trusted knowledge, the agent’s reasoning becomes consistently biased. This can lead to incorrect approvals, unsafe recommendations, skipped validations, or flawed risk assessments.

Unlike one-off hallucinations, these errors repeat reliably because the underlying memory is reused. Over time, the system may appear to learn behaviors that conflict with business rules, security expectations, or user intent, even though no explicit goal change was ever requested.

Memory poisoning also creates a high risk of cross-user and cross-agent impact. Shared memory stores, reused RAG indexes, or loosely segmented vector databases allow one attacker’s input to influence many users or agents. In multi-tenant systems, this can result in data leakage, incorrect actions taken on behalf of other users, or exposure of sensitive context through unintended retrieval.

Finally, the impact is amplified by persistence and invisibility. Because poisoned memory can outlive logs, sessions, and alerts, teams may see symptoms such as financial losses, incorrect automation, or policy violations without a clear triggering event. Incident response becomes forensic and expensive, requiring audits of memory stores, rollback of summaries, and revalidation of historical decisions.

These characteristics make Memory and Context Poisoning one of the most dangerous agentic risks. The agent may be behaving exactly as designed, but on a foundation of corrupted context that quietly compromises safety, accuracy, and trust.

Scan your code & stay secure with Snyk - for FREE!

Did you know you can use Snyk for free to verify that your code
doesn't include this or other vulnerabilities?

Scan your code

Memory and Context Poisoning Mitigation

Mitigating this vulnerability requires treating memory and retrievable context as security boundaries, not just as optimization features. Anything an agent can remember, retrieve, or reuse must be assumed to be influenceable by adversaries unless proven otherwise. The goal is to prevent untrusted data from becoming durable knowledge, limit how far poisoned context can spread, and make corruption detectable and reversible.

Apply baseline data protection to memory systems

Agent memory should be protected like any other sensitive data store. Encrypt memory in transit and at rest, apply strict access controls, and ensure only the minimum set of services and agents can read or write to each memory store. Many poisoning attacks succeed simply because too many components can write to shared memory without authentication or auditability.

Validate all memory writes before committing them

Do not automatically persist raw conversation summaries, retrieved content, or agent-generated conclusions. Before writing anything to memory, scan it for malicious patterns, unsafe instructions, sensitive data, or claims that could influence future decisions. This can include rule-based checks, anomaly detection, and secondary model evaluations designed specifically to assess whether content is safe to store long-term.

Crucially, validation must apply not only to user input, but also to agent outputs, summaries, and peer-agent messages. An agent should not be able to poison itself or others through its own generated content.

Segment memory by user, task, and domain

Memory segmentation is one of the most effective controls against widespread poisoning. Isolate memory by user session, tenant, and functional domain so that information learned in one context cannot automatically influence another. For example, user-specific preferences should not be mixed with global policy memory, and experimental learning should not affect production decision-making agents.

In RAG and vector systems, use strict namespace isolation and avoid shared indexes unless entries are explicitly curated and verified.

Control access, retention, and lifespan of memory

Not all context deserves to live forever. Enforce retention policies based on sensitivity and trust. Unverified or low-confidence memory should expire quickly. High-impact memory, such as pricing rules, security policies, or workflow shortcuts, should require authenticated sources and explicit approval before being stored.

By minimizing retention, you reduce the persistence window of any successful poisoning attempt.

Track provenance and detect anomalies

Every memory entry should carry provenance metadata such as source, timestamp, confidence score, and ingestion path. This allows you to answer questions like “where did this belief come from?” and “why is it being retrieved?”

This also allows you to monitor for suspicious patterns, such as unusually frequent updates, repeated reinforcement of the same claim from a single source, or sudden shifts in retrieved context. These signals often indicate slow, deliberate poisoning rather than accidental error.

Prevent automatic re-ingestion and self-reinforcement

One of the most dangerous patterns in agentic systems is automatic feedback loops, where an agent’s outputs are fed back into its own memory as trusted knowledge. This can quickly amplify small errors or malicious seeds into entrenched false beliefs.

Block automatic re-ingestion by default. If agent outputs are candidates for memory, they require explicit validation, trust scoring, or human review before they are stored.

Add resilience through snapshots, rollback, and review

Assume poisoning will eventually happen and design for recovery. Maintain versioned snapshots of memory stores, support rollback and quarantine of suspicious entries, and rehearse incident response scenarios involving memory corruption. High-risk actions influenced by memory require human review or secondary verification to ensure decisions are not being driven by corrupted context.

Weight retrieval by trust, not just relevance

Retrieval systems should not surface memory purely based on semantic similarity. Combine relevance with trust signals such as verified source, human approval, tenancy match, and freshness. Low-trust entries should decay over time and require additional confirmation before they can influence high-impact decisions.

These controls ensure that even if poisoned data enters memory, it is less likely to shape behavior silently and persistently.

Quiz

Test your knowledge!

Quiz

What is the primary difference between classic prompt injection and Memory and Context Poisoning in an agentic system?

Keep learning

If you want to explore Memory and Context Poisoning in more depth and see how it fits into the wider agentic threat landscape, these OWASP resources are a great next step:

Congratulations

You have taken your first step into learning what Memory and Context Poisoning is, how it works in agentic systems, and why it is so dangerous. You now understand how attackers can quietly corrupt what an agent remembers, how that poisoned context can persist across sessions and users, and how it can reshape reasoning, planning, and tool use long after the original attack.

More importantly, you have learned how to defend against this risk by treating memory as a security boundary. By validating memory writes, segmenting context, tracking provenance, limiting retention, and designing for rollback and recovery, you can prevent small, subtle attacks from turning into long-term systemic failures.