What is Personally Identifiable Information (PII)?

PII: the basics

What is PII?

Personally Identifiable Information (PII) is any data that can be used to uniquely identify an individual, either on its own or when combined with other data. PII includes details that can be directly linked to a person, such as their full name or social security number, as well as information that can indirectly identify them, like an IP address or a combination of demographic details. The concept of PII is important because organizations today collect vast amounts of user data. The distinction between direct and indirect identifiers is important for developers, as both types of information can expose an individual's identity if handled improperly.

PII comes in many forms, ranging from the obvious to the less intuitive. Direct identifiers are data points that can immediately identify an individual, like:

Address
Biometrics (fingerprints, retina, etc.)
Credit/debit numbers
Email (John.doe@email.com)
Name
Passport number
Personal identification number (PIN)
Social security number

These are often considered highly sensitive because of their unique nature. Indirect identifiers include data like

Date of birth
Gender
Phone numbers
Race
Zip/Postal codes

On their own, these pieces of information may not reveal an individual’s identity, but when combined with other data, they can be used to pinpoint someone. For example, knowing a person’s birthdate and postal code may not be enough to identify them in a large city, but in a small community, it could significantly narrow down the possibilities.

About this lesson

In this lesson, you’ll understand the concept of Personally Identifiable Information (PII), its importance in software development, and best practices for handling and protecting it.

Importance of PII

Managing Personally Identifiable Information (PII) is a big deal! Mishandling PII can have serious consequences. Not just for the individuals affected, but also for the companies involved. Data breaches are a common threat, and many of them involve PII exposure. When PII is compromised, individuals can suffer from identity theft, financial loss, or even damage to their reputation. On the company side, this can lead to legal penalties, massive fines, and a loss of public trust. For example, under GDPR, companies can be fined up to 4% of their global revenue for mishandling personal data, and in the U.S., non-compliance with the CCPA can lead to hefty fines. Protecting PII isn’t just about avoiding penalties, it's also about staying compliant with privacy laws and protecting the company’s reputation.

As a developer, you’re on the front line of this battle. By following secure coding practices, keeping up with the latest data privacy regulations, and ensuring systems are built to handle PII responsibly, you help keep companies compliant and secure. But it’s not just about compliance, it’s also about trust.

Users today care a lot about how their data is handled, and they’re paying attention. Companies that show they take PII protection seriously are more likely to gain and keep customer loyalty. On the other hand, a single data breach can cause serious damage to a company's reputation, sometimes permanently. Just look at companies like Equifax or Yahoo, whose massive breaches left a lasting mark on their public image. By designing secure applications and using privacy-first principles, developers play a huge role in safeguarding that trust.

You have two important jobs: ensuring your applications meet legal standards for PII protection and building systems that earn and keep user trust. This means collecting only the data you need, storing it securely, and using techniques like encryption and data masking to keep it safe. By embracing "privacy by design," where privacy is built into the very foundation of an app, developers can help ensure both compliance and trustworthiness in the systems they create.

PII in action

In software development, Personally Identifiable Information (PII) can be scattered across different parts of an application’s architecture, from databases to APIs.

Talking about databases, this is where PII is commonly stored. It's where user profiles, transaction histories, and payment details live. This can include anything from usernames and addresses to more sensitive information like social security numbers or credit card data. Take an e-commerce platform, for example. A database might store not just a user’s name and email but also billing info, making it a prime spot for sensitive PII. It's crucial to ensure that this information is stored securely and accessed only when absolutely necessary.

But databases aren’t the only place PII can hide. PII can also show up in system logs. Logging is a vital part of software development, helping with debugging, monitoring, and tracking how your application behaves. However, logs can accidentally capture and store sensitive PII if not set up properly. Imagine a logging system that records user IDs, emails, or passwords (hopefully not!!!), especially when trying to troubleshoot login issues. To avoid this, developers should configure their logging systems to mask or exclude sensitive PII and reduce the risk of accidental exposure.

APIs are another area where PII is often handled, as they enable communication between different services and frequently exchange user data. Whether it’s a mobile app talking to a backend server or a third-party service requesting user information, APIs can expose PII if they’re not properly secured with encryption and access controls. By ensuring that APIs are locked down and using strong security measures, developers can prevent unauthorized access to sensitive user data.

Let's look at this in action. You run a simple command to check the logs, $ cat /var/log/app/login.log

You notice the problem immediately. The logs are capturing sensitive PII like email addresses and passwords. This is a serious issue. If someone gains access to these logs, they could steal user credentials and potentially cause a lot of damage.

Next, you decide to dig deeper into the API requests, thinking this might reveal more PII. You run another command to check API logs, $ cat /var/log/app/api_requests.log

Now things are getting worse! Not only are usernames and emails exposed, but also partial credit card information. Even though it’s masked, hackers can exploit this data to commit fraud or target users for phishing attacks.

At this point, you realize that the exposure of PII in logs and APIs is a ticking time bomb. If attackers get access to these logs, they can:

Steal credentials and log into user accounts.
Exploit credit card details for financial gain.
Launch phishing attacks using the exposed email addresses.

PII mitigation

One of the foundational principles in managing Personally Identifiable Information (PII) is data minimization. This concept revolves around collecting only the essential data needed to fulfill the purpose of a particular feature or function within an application. By limiting the amount of PII collected, developers reduce the overall risk in case of a data breach or misuse. For instance, if an application requires only a user’s email to create an account, there is no need to collect their full address, date of birth, or phone number. Storing excess data increases the risk of exposure and goes against privacy regulations like the General Data Protection Regulation (GDPR), which emphasizes the importance of collecting the least amount of personal data necessary. Developers should also regularly evaluate their data collection processes and ensure that no unnecessary data is being gathered or retained.

In practice, data minimization extends to every stage of development, from the initial design of features to how data is handled in the backend. Developers should be mindful of what data is captured during user interactions and carefully assess whether each piece of PII is truly necessary for the application’s functionality. Additionally, they should implement strict data validation rules to ensure that users are only allowed to submit the required information, thus preventing the inadvertent collection of extra data. By adopting a minimalistic approach to data collection, developers can significantly reduce the risk surface area, protecting both the user and the company from potential harm.

Masking and anonymization

Another key practice is data masking and anonymization, which can help protect sensitive PII when stored or used in non-production environments such as testing or development. Data masking involves obscuring PII so that it remains unreadable to unauthorized users or systems. This is especially useful when sensitive data needs to be displayed in user interfaces or transmitted across systems. For example, displaying only the last four digits of a credit card number or partially masking a social security number can allow necessary functionality without exposing the entire sensitive dataset.

Anonymization, on the other hand, refers to removing personally identifiable characteristics from data altogether, making it impossible to trace it back to an individual. This can be achieved by stripping away direct identifiers like names or social security numbers and generalizing data points such as age or location to avoid re-identification. By anonymizing data, developers can still analyze user behavior or trends without compromising privacy. However, it's important to note that true anonymization can be difficult to achieve, as combining different pieces of anonymized data can sometimes inadvertently reveal an individual. Developers need to stay aware of this risk and regularly review anonymization techniques to ensure they are robust and effective.

Encryption

Encrypting data ensures that even if it is intercepted, stolen, or accessed by unauthorized parties, the information will be unreadable without the appropriate decryption keys. In modern software development, encryption should be applied to PII both at rest (when stored in databases, files, or backups) and in transit (when data is transmitted across networks, APIs, or between client and server). Developers should implement strong encryption algorithms, such as AES (Advanced Encryption Standard) for data at rest and TLS (Transport Layer Security) for data in transit, to safeguard PII.

Encryption should not be limited to databases and network communication, though. Developers should also consider encrypting sensitive data within logs, cache systems, and wherever PII might be temporarily stored. However, encryption alone is not enough. Developers need to implement robust key management practices to ensure that encryption keys are securely stored and rotated regularly. Additionally, secure access control policies should be enforced, so only authorized personnel or systems can access decryption keys. Encryption adds an extra layer of security and is often required by regulatory frameworks, making it a critical component of any data protection strategy.

Access Control

Implementing access control is essential to limiting who can access PII within a system. The principle of least privilege should guide access control policies, ensuring that users and systems are granted only the minimum level of access necessary to perform their duties. For example, a customer support representative may need access to a user’s name and email but should not be able to see sensitive financial information like credit card numbers. Similarly, certain PII should only be accessible to specific internal systems or administrators, and developers should ensure that access is tightly restricted.

Role-based access control (RBAC) is one approach to managing this, where different users are assigned roles with specific permissions, ensuring that only authorized personnel can interact with sensitive data. Developers can further enhance access control by implementing multi-factor authentication (MFA) for accessing sensitive areas of an application or database. MFA adds an extra layer of security, requiring users to provide multiple forms of identification (e.g., a password and a one-time code) to access PII. This reduces the risk of unauthorized access, even if a password is compromised. By controlling and limiting who can access PII, developers can significantly reduce the chances of internal data leaks and unauthorized exposures.

Data Retention and Disposal

Finally, managing the data retention and disposal processes is crucial to handling PII. Developers should design systems with clear data retention policies, ensuring that PII is only kept for as long as necessary to meet the intended purpose. For example, user data that is no longer needed for active accounts or completed transactions should be securely deleted. Retaining outdated or irrelevant PII increases the risk of exposure, especially in the case of a data breach. Compliance with regulations like GDPR often mandates that users have the "right to be forgotten," meaning developers must also build in the capability to delete user data upon request.

When it comes to data disposal, it’s not enough to simply delete files from the database or storage. Secure deletion practices should be implemented to ensure that PII cannot be recovered after it is removed. This could include methods like overwriting sensitive files with random data or using cryptographic erasure for encrypted data. Developers should also be mindful of where PII might be stored, such as in backups or logs, and ensure that these are purged as part of a regular maintenance routine. Implementing strong data retention and disposal practices reduces the amount of data at risk and helps organizations stay compliant with privacy laws.

With these best practices, data minimization, data masking and anonymization, encryption, access control, and secure retention and disposal, developers can ensure they handle PII responsibly, maintain user trust, and comply with global data protection regulations. These practices collectively create a robust framework for safeguarding sensitive information in modern software systems.

Quiz

Test your knowledge!

Why is it important for developers to properly manage and protect Personally Identifiable Information (PII) in their applications?

Keep learning

Learn more about PII and other examples of the dangers of exposing it here:

A Snyk Learn lesson on sending information in cleartext https://learn.snyk.io/lesson/cleartext-sensitive-information-in-cookie/
Learn more about data loss prevention for developers here https://snyk.io/blog/data-loss-prevention-for-developers/

Congratulations

You have taken your first step into learning more about PII, what it is, and why we need to protect it. We hope that you will apply this knowledge to make your applications safer. Make sure to check out our lessons on other common vulnerabilities.

What is it and how do we protect it?

General

PII: the basics

What is PII?

About this lesson

Importance of PII

PII in action

Scan your code & stay secure with Snyk - for FREE!

PII mitigation

Masking and anonymization

Encryption

Access Control

Data Retention and Disposal

Quiz

Test your knowledge!

Quiz

Keep learning

Congratulations

FAQs

What to learn next?

What is Personally Identifiable Information (PII)?

What is it and how do we protect it?

General

PII: the basics

What is PII?

About this lesson

NIST SP 800-112

Importance of PII

PII in action

Scan your code & stay secure with Snyk - for FREE!

PII mitigation

Masking and anonymization

Encryption

Access Control

Data Retention and Disposal

Quiz

Test your knowledge!

Quiz

Keep learning

Congratulations

FAQs

What to learn next?