• Browse topics
Login

Generation of predictable numbers or identifiers

How intruders can use weaknesses in random number generation to launch more successful attacks.

~20mins estimated

Python

Predictable numbers or identifiers: the basics

What is predictable number generation?

Computers are built based on logic and predictability given a particular input, so generating a truly random number, key, or identifier goes against their nature. They don’t contain the entropy required for randomness. True chaos is found in the realm of our own reality – in processes like quantum phenomena and thermal noise.

Our typical computer will use something called Deterministic Random Bit Generators (DRBGs), or pseudorandom number generators, to produce a sequence of ā€˜random’ values. The whole process starts with a seed (which is confidential) and then uses a deterministic algorithm to conjure up the rest of the sequence, mimicking randomness as thoroughly as is possible with a typical modern computer.

Although the output of a random number generator like the ones described should never be predictable, sometimes attackers are able to glean information about new generations by using the previous sequence of values or finding out what the original seed was. While this often does not allow them to predict the next value exactly, it can dramatically decrease the number of possibilities and hence really reduce the difficulty of something like a brute force attack. Let’s get into this in more detail in the sections below!

About this lesson

In this lesson, we will explore why truly random number generation is vital to keep an organization’s accounts and information secure. Learn why it is so difficult to produce random numbers, and what can happen when they become predictable.

FUN FACT

Lava lamps??

Randomness is vital when we think about secure encryption methods (ex. producing an encryption key). Luckily, an unsuspected source is here to save the day. Lava lamps are truly random, and never take on the exact same shape of lava twice. Because of this, Cloudflare uses a wall of 100 lava lamps to produce the necessary entropy for their encryption algorithms (in particular TLS and SSL), basing key generation on the state of lava formations at a particular point in time! See this article for more information.

Importance and background of predictable numbers or identifiers

To dive in a little deeper, randomness has a specific meaning in the context of cryptography. Statistically random values describe the case that each individual outcome is unpredictable but the frequencies of outcomes long term may be predictable. However, in cryptography we need values that are consistently unpredictable even in the long term, with no patterns to observe. But why is this such a challenge for current computers?

Again, the machines we use today run on logic, and programs are built on if-then statements. The same input will result in the same output, which makes sense in the context of how they are used. If you input 2 into a program that is meant to find the square, you should always be getting 4, not random values unrelated to the input.

The problem, however, is that true randomness, or as close as we can get to it, is necessary for encryption. Encryption algorithms are only as good as the keys they produce, so jumbled data only protects sensitive information when it is difficult to decrypt. Considering the amount of data that is encrypted these days, including PII, PHI, credit card information, etc., using vulnerabilities in randomness to crack encryption poses a serious threat to the people whose information it is and the reputation of the organization tasked with protecting it adequately.

Predictable numbers in action

So what’s the deal with pseudorandom number generation and DRBGs?

This is how modern computers are best able to produce random values. By taking in an unpredictable input, the computer generates unpredictable outputs. However, there are some problems that arise, especially when the input starts to become predictable. Even when the input is adequate, given the same seed, pseudorandom number generation and DRBGs will produce the exact same output, which means if an attacker can guess the seed in any way all of the encryption (or whatever else the randomness is being used for) is at risk.

It is also extremely difficult to know whether the result will continue to be random even over an extended period of time, meaning new sources of randomness must be introduced over time. See the diagram below for a breakdown of the steps in this process.pseudorandom number generation diagram

Example: Rand and his gambling!

A classic example of insufficient randomness is having numbers generated based only on a timestamp-dependent seed. Suppose a small tech startup has launched an online-gambling app that offers scratch cards. To enter, a player will pay $1 per card, and can potentially win anything from $1 to $1,000 if their scratch card contains a sequence of randomly generated winning numbers.

predictable numbers in action 2 image

Now meet Rand, a security consultant who happens to use this app often at work. He wins a few times, starts to notice a pattern in the ā€œrandomā€ numbers based on when he purchases the cards, and decides to look further into it.

It turns out that the developers behind the app used Python's random module seeded with the current timestamp to generate lottery numbers. Unfortunately for them, this means Rand can predict future lottery numbers by synchronizing his clock with the server and running the same PRNG locally. He writes a script that predicts the next winning number by seeding his local PRNG with the current timestamp, then purchases cards only when a jackpot number is about to be generated.

predictable numbers in action 3 image

At the end of the day, though, Rand is a security consultant and decides to notify the app creators of the vulnerability. The development team fixes the issue by switching to Python's secrets module, which provides cryptographically secure random numbers.

Let’s take a closer look at what this software actually is (both vulnerable and then subsequently fixed)…

Scan your code & stay secure with Snyk - for FREE!

Did you know you can use Snyk for free to verify that your code
doesn't include this or other vulnerabilities?

Scan your code

Predictable numbers under the hood

First, this is what the insecure code initially produced by the app's developers might have looked like:

As you can see, the developers used the random package combined with the current time to create the seed and generate the winning number. The developers use pseudorandom number generation, which in itself is not inherently insecure, unless the seed is compromised or predictable. The insecurity lies in the first line under the generate(self) method:

random.seed(int(time.time()))

Here, we are generating the seed, which serves as the starting point or initial value used by a pseudorandom number generator (PRNG). All PRNG algorithms are deterministic — meaning if you start with the same seed, you will always get the exact same sequence of "random" numbers. The problem here is that a hacker (like Rand!) who knows roughly the time the lottery was run could try guessing the exact second (the seed). If he hits the right seed, he can run the exact same code and predict the winning number before it is drawn. This makes the lottery completely vulnerable to prediction.

In terms of user interface, we can model it using a terminal output. First, type python_main.py into the terminal, then type scratch card to see if you win anything.

Demo terminal

Now, let's take a look at Rand's code that perfectly exploits this seed vulnerability.

Rand's code duplicates the developer's lottery generation logic in his predict_lottery(timestamp) function.

Then, Rand's code brute-forces the seed by starting with the current time and adding 1 second each iteration (future_offset), checking for the next hour. In each loop, Rand passes the potential future time (which serves as the seed) to predict_lottery(). The code immediately checks the resulting prediction: if winning == user:.

Since the numbers are generated sequentially from the same seed, Rand is predicting a scenario where the first generated number (winning) happens to equal the second generated number (user). This is a rare, but guaranteed, jackpot if the conditions are met. Once a jackpot time is found, Rand simply has to wait - the code uses a countdown loop to match the system time to the exact second (jackpot_time) that the jackpot was predicted to occur.

In terms of running Rand's code, the user interface might look like this. Type in Scan for jackpot timing to see if there is a winning number in the next hour.

Demo terminal

Unfortunately, no lottery numbers are generated in the next hour, so Rand will have to check again another time!

Predictable numbers and identifiers mitigation

Again, the flaw in the app developers' logic is that they are not using a sufficiently chaotic process to create the seed for the random number generation. The pseudorandom number generation (PRNG) method relies on strong entropy for the input in order to produce genuinely unpredictable outputs.

Cryptographically secure pseudorandom number generation (CSPRNG)

To ensure that your PRNG (or DRBG) is cryptographically secure, follow these two main pieces of guidance:

  1. The random number generator must pass certain statistical randomness tests to prove unpredictability beofore it is employed programmatically.
  2. An attacker must not be able to predict the outputs of the CSPRNG even if they have partial access to the program.

Some general guidance from NIST also suggests that what entropy input you are using should be greater than the security strength of the instantiation - or the quality of your random data should exceed the level of security you are aiming for.

"Security strength" is measured in bits, represents the amount of computational effort required to break the system, and is a base 2 logarithm given by 2^the_number_of_bits. So an 128 bit security strength means an attacker would need to try 2^128 operations to break it. Entropy is also measured in bits - for example a fair coin has 1 bit of entropy per flip. To ensure a 128 bit security strength remains secure, the entropy should exceed 128 bits. The use of more entropy than the security strength offers a cushion, which may be useful when the entropy is overestimated in any way.

Here is what a cryptographically secure pseudorandom number generator might look like within the lottery app:

The security hinges on the import secrets module and its randbelow() function, unlike the random module, which was originally designed for statistical modeling, not security.

What is different here is that the secrets module does not generate randomness itself - instead it uses the most secure source of randomness available to Python: os.urandom().

The os.urandom() function draws raw data directly from the operating system's (OS) entropy pool. The OS collects this entropy from unpredictable physical sources like:

  1. Timing between disk I/O operations.
  2. Mouse movements and keyboard input timings.
  3. Hardware interrupt timings or thermal noise.

This input is much more 'true' randomness than a simple timestamp. Plus, there is no explicit seeding in this version of the code, and instead the number generation is built on the OS's continuously refreshed entropy pool.

Please see the NIST guidance concerning random bit generation for more information.

*Note: this vulnerability (CWE-343) came up most often in the context of HTTP Parameter Pollution (CVE-2025-7783) within our data set. This vulnerability occurs when an attacker exploits the way a server handles multiple HTTP parameters with the same name, and often originates from the compromised form-data package that creates a predictable boundary strings. Similar to our example, using CSPRNG for a secure boundary string helps prevent this vulnerability before it starts.

Quiz

Quiz

Which of the following options is the best source of entropy for a CSPRNG?

Keep learning

Now you know lots about randomness and its importance in fields like cryptography. Continue your journey with some of our other relevant lessons, such as:

Congratulations

Thank you for completing this lesson on generating predictable numbers and identifiers! Now you are equipped to generate true entropy in your code and to prevent attackers from exploiting randomness weak spots!