Cross-site scripting (or XSS) is a code vulnerability that occurs when an attacker “injects” a malicious script into an otherwise trusted website. The injected script gets downloaded and executed by the end user’s browser when the user interacts with the compromised website. Since the script came from a trusted website, it cannot be distinguished from a legitimate script.
In this lesson we will demonstrate how an XSS attack can play out in a chat application. Next, we will dive deeper and explain the various forms of XSS. Finally, we will study vulnerable code and learn how to fix it.
But before we jump into the lesson, have you ever heard of a self-retweeting tweet?
In 2014, an Austrian teenager @firoxl was experimenting with his feed on Twitter, trying to make it display the Unicode ‘heart’ character. By doing so, he inadvertently discovered that Twitter’s feed was vulnerable to an XSS attack! @firoxl immediately reported the issue to Twitter, but it was too late. His discovery was already making rounds on social media.
Less than two hours after @froxl’s discovery, a German IT student @derGeruhn published a Tweet that exploited XSS to ... retweet itself. Thus, the self-retweeting tweet was released into the world. It retweeted itself hundreds of thousands of times and affected thousands of Twitter accounts, including @NYTimes and @BBCBreaking. To end its reign, Twitter had to take their whole feed offline.
To achieve its nefarious purposes, the script exploits an XSS vulnerability. Not sure how it works? Read on!
A self-retweeting tweet
A company called startup.io decided to deploy an internal chat application for their employees. However, instead of using Slack, Discord or similar, the company chose to create its own chat service.
You are an engineer working for startup.io, and you’ve just learnt about the self-retweeting tweet that plagued Twitter a few years ago. You are curious to see if you could exploit your company’s chat web application in a similar way. You inform your in-house security team and your manager about your intentions, and then you get to work.
To exploit the application, you will be using a conversation with Emily, a fellow startup.io engineer. First, let’s be polite and inform Emily what we will be doing by sending her the following message:
Hey! I will be stealing your cookies. Is that ok?
Does our chat example seem unrealistic to you? Well, a similar scenario happened in the real world! In 2018, a Twitch streamer dwangoAC attempted to use an alpha version of a Twitch chat wrapper software that was vulnerable to XSS. When his audience discovered the vulnerability, the stream quickly turned from a live video gaming event into a hackathon contest. See the recording of that Twitch session on YouTube.
This isolation is called the “same-origin policy“, and it is enforced by the browser. In a nutshell, XSS is a vulnerability that breaks the same-origin policy. And that’s what we did when we compromised the chat application. To understand what exactly happened, let’s take a look at the server code responsible for storing and displaying a chat message.
An XSS attack illustration which shows a hacker sending a malicious script to a website
There is much more to say about XSS and its different types. This lesson is only an introduction to XSS–it barely scratches the surface. We will cover reflected XSS and DOM-based XSS in much more detail in future lessons.
In addition, XSS is likely the most common web vulnerability. Do not take it lightly. Read on to learn how to mitigate XSS in your application.
In 2005, a security researcher Samy Kamkar created an XSS worm named ‘Samy’. The worm was unleashed on the social networking site MySpace and affected over one million users within the first 20 hours of its lifetime, making it the fastest spreading virus of all time.
XSS is extremely popular for a reason: we programmers very often inject user-supplied data into the responses we send back to users. The first step to mitigate XSS is to find all places in your code where this pattern occurs. Input data might be coming from a database or directly from a user request. Any data which might have originated from a user at any point in the past is a suspect.
This is a daunting task and requires you to review your code carefully. Luckily, security scanners such as Snyk Code can automate most of the work for you.
Having identified all the places where XSS might be happening, it’s time to get your hands dirty and code your way out of danger. The first and the most important XSS mitigation step is to escape your HTML output. To do that, you should HTML-encode all dangerous characters in the user-controlled data before injecting that data into your HTML output.
For example, when HTML-encoded, the character
<, and the character
& etc. This way, the browser will safely handle the HTML-encoded characters, i.e. it will not assume they are part of the HTML structure of your page.
Remember to encode all dangerous characters. Don’t assume only a subset of characters needs to be escaped for your specific use case. Bad guys are very creative and will always find ways to bypass your assumptions.
When writing PHP you can use the built-in
htmlspecialchars function to escape HTML characters.
XSS mitigation where a hacker tries to inject a malicious script but the script's content is escaped
Be as strict as possible with the data you receive from your users. Before including user-controlled data in an HTTP response or writing it to a database, validate it is in the format you expect. Never rely on blocklisting—the bad guys will always find ways to bypass it!
For instance, in our chat application, we expect the
messageId to be a valid UUID and the
senderEmail to be a valid email. Note that in the example we changed
generateSenderHTML. This demonstrates two layers of defence to prevent XSS with the
senderEmail parameter: we both validate it before saving it to a database and later escape it when injecting it into HTML.
We can use Webmozart Assert, which has validation functions for many common data types.
It is mandatory to perform type validation of user input before writing it to a database. However, it is also strongly recommended to validate data after reading it from the database. This can save us when the database gets compromised, and the malicious data gets injected through means other than the vulnerable API we secured in the previous paragraph. To validate data read from a database, you can use the validation techniques we presented above. Alternatively, we recommend using trusted database libraries that perform type validation out of the box, for example, ORM libraries.
The above mitigation is effective against situations where user input is used as the content of an HTML element (e.g.
<div> user_input </div> or
<p> user_input </p> etc.). However, there are certain locations where you should never put a user-controlled input. These locations include:
There are some exceptions to the above rules, but explaining them goes beyond the scope of this lesson. If you do need to place user-controlled input inside any of the listed locations, please follow the OWASP Prevention Cheat Sheet for a more detailed advice.
To learn more about XSS, check out some other great content produced by Snyk:
You’ve taken your first step into understanding XSS and preventing it from affecting your code! We hope you will apply your new knowledge wisely and make your applications safer. Please rate how valuable this lesson was and provide feedback to make it better. Also, make sure to check out our lessons on other common vulnerabilities.