This isn’t a trick question. What is randomness? In everyday usage we might have some idea that randomness describes something that is unpredictable or unexpected – that things which are random lack any determinable pattern and are hard or impossible to predict with any certainty. Within mathematics, a numerical sequence is said to be statistically random if it contains no recognizable patterns or regularities. Examples might include the digits of numbers such as π (“pi”) which certainly appear to be randomly distributed, but in fact mathematicians are not certain that the digits are random, because they cannot absolutely prove it.
It turns out that determining if something is random is actually incredibly hard to do. We intuitively understand that patterns that appear to be random are “non-deterministic” in that we cannot predict future outputs from the outputs we already have (knowing the first 700 digits of pi does not let us be able to predict the 701st) as well as apparently chaotic in that there is no repeatability or structure, and any digit is as likely as any other to be next in sequence.
Frustratingly, a sequence can be truly random even if it contains apparent structure. The only requirement for randomness is that “in the long run” (essentially over an infinite period) the sequence itself has a random distribution, even if certain sub-sequences would not look random at all.
Attempting to provide assurance that an algorithm or method for generating randomness genuinely does so is surprisingly complex. In the absence of mathematical proof, the best alternative are tests that have been invented over time to provide reasonable certainty that a generated sequence is valid by analysing a sufficiently large sequence from its output. Tests such as the well-known “Diehard” test set apply dozens of different tests including “Monkey tests” that compare sequences to output that might be expected based on the “infinite monkey” theorem. Because of the shades of uncertainty, many theorists now believe that randomness is more of a “spectrum” than a binary state – Benoit Madelbrot proposed a classification of “seven states of randomness,” and some theorists more recently simply believe that there is an infinite hierarchy of grades of randomness.
If agreeing what randomness is seems complicated, we haven’t even got to the difficult bit yet, which is trying to create randomness or “entropy” as its commonly termed in computing. We might initially be a little skeptical of this and think of any number of simple ways of generating randomness, but it is surprisingly difficult to do. It’s well known among street performers and magicians for example that if you ask people to pick a number between 1 and 10, far more people choose “7” than any other number, and that you can influence the outcome even more distinctly simply by “leading” the person with verbal, visual or physical suggestions.
The same general problem is true of any system intended to generate random numbers: that, essentially, if you know or can influence the inputs, environment, and current state of a system, then you can predict with absolute certainty in advance any output that it claims is “random.” Classic examples of methods for generating apparent random outcomes include rolling dice, coin flipping, or picking a card from a shuffled deck. However, in all these instances, any difficulty in predicting output is down to a difficulty of the observer in tracking the current state of the system in sufficient granularity to model it and forecast the outcome. In mechanical physics, if all properties of the coin flip were known for example – such as air density, wind speed, weight of the coin, angle of the arm, force applied to the coin etc. – then the outcome could be predicted with absolute certainty. Zero randomness.
Prior to quantum physics, it was thought that all physical systems had a determinate state which determined all of its measurable properties, hence that by measuring those properties you could determine the state with certainty. In other words, perfect knowledge of the initial conditions would render outcomes perfectly predictable. Any apparent ‘randomness’ is an illusion and is subjective rather than objective – it is “in the mind of the beholder” rather than inherent in the system and stems purely from ignorance of the details of the system on the part of the observer. Many scientists believe that even subatomic particle behaviour observed in quantum physics isn’t really random but rather just as predetermined as everything else, and it appears random only because humans currently lack the understanding and techniques to measure and model their behaviour.
This may seem as though it is all a little abstract, but it has very real implications for whether different sources of randomness can truly be relied upon, which can be highly problematic in serious applications where guarantees of randomness are critical, such as network traffic encryption, as we shall see. For organisations such as the military, failing to seed cryptographic functions with true randomness can literally cost lives.
Random number generation is the process by which a sequence of numbers or symbols that cannot be reasonably predicted better than by random chance is generated. The systems that produce the random numbers are normally termed random number generator (RNGs). True random number generators (also known as hardware random number generators) will typically use some form of dedicated hardware that sources or “harvests” entropy from some attribute of the surrounding physical environment: specifically, attributes that are highly volatile and constantly changing and can be recorded with almost infinite granularity given the right equipment.
Entropy sources in these hardware RNGs will often relate to some unpredictable environmental condition (such as background radiation) that can be recorded with great accuracy and the least significant bit collected to produce an apparently random number. The element that converts the physical or environmental phenomenon being measured into an electrical signal is known as a transducer, and it is normally paired with a secondary stage of amplification that increases the amplitude of the random variations in the least significant bits of the signal.
Alternatives to environmental phenomenon can include randomness collected from hardware sources that produce their own “noise,” such as the minor (least significant bits) in computer fan noise or HDD vibrations. The Linux kernel for example generates entropy from keyboard timings, mouse movements, and IDE timings and makes the random character data available to other operating system processes.
As we saw above, none of these phenomena are truly random, but rely more on the difficulty of an observer being able to significantly observe or interfere with the phenomenon directly – but side channel attacks are certainly possible in theory. Due to the difficulty of harvesting entropy from these sources there are also a class of RNGs known as pseudorandom number generators (PRNGs) that generate numbers that are pre-determined from an initial seed and rely on external parties not being able to know the internal state of the PRNG in order to predict its next output.
PRNGs produce sequences of numbers that are chaotic in that they are statistically random with a random distribution. However, the sequences of numbers are produced as “rounds” with each generated from the former round. The process is “kick-started” initially therefore by providing the algorithm with a “seed” value. The trouble is that seed values are often hard-coded and re-used on application initialization or else the seed is derived from the current time from the system clock. In both alternatives, it’s possible for an attacker to potentially learn the value of the seed and they would then be able to predict with certainty every single “random” value produced, by simply executing the algorithm themselves with the same seed value and getting the same output.
There are also hybrid systems that incorporate both hardware and pseudorandom techniques, such as the “Lava-rand” system developed by SGI that works by taking photos of the patterns made by the floating bolus of coloured wax suspended in lava lamps and then using this to seed (provide an input to) a pseudorandom number generator.
How important it is that a source of entropy provides true randomness that is not able to be either predicted or influenced depends on the specific purpose to which it is being put. Applications where randomness does not have to be particularly rigorous might include usage in generating a “message of the day” on a forum or electronic noticeboard or when logging on to a computer: the only real requirement is that there is a reasonable chance of the user seeing a different message each day, and the stakes are pretty low if messages repeat more often than expected for example.
Other uses where randomness is useful, but the degree of randomness can be pretty low and still remain acceptable for its purpose might include its use as a seed for generating apparently-unique environments within dynamically-generated maps, environments or “dungeons” in computer games. The randomness is used purely to save explicit level design and introduce novelty but again the randomness doesn’t have to be particularly rigorous and truly non-deterministic in order to perform its function.
A “middle ground” where randomness doesn’t have to be mathematically rigorous, but a higher degree of entropy is certainly beneficial might include situations where the randomness impacts the efficiency of the mechanism it supports, such as in sorting algorithms. The “Quicksort” algorithm familiar to most computer science undergraduates for example picks a pivot randomly from an array and then divides the original array into two subarrays: a subarray that consists of numbers smaller than the pivot and the other with larger numbers. A weakly random input would affect the algorithm’s performance, but not its overall function or utility.
Areas which demand more rigorous guarantees or grades of randomness might be those used within general business functions where accuracy of outcome can impact spending decisions. Examples might include areas such as use within statistical sample selection or the so-called “Monte Carlo simulations” that are sometimes used in areas such as risk modelling and forecasting, as well as in generating a suitable spread of test cases for “fuzzing” and testing of code.
The most critical applications, where higher assurance that entropy sources are truly chaotic and non-deterministic include usage cases such as in data perturbation techniques used to mask sensitive data; uses within cybersecurity algorithms (as we shall look at shortly); and in generating numbers within lotteries or other gambling where money is directly on the line.
Perhaps the most obvious and widely-known use of entropy within cybersecurity is its use in cryptography where it is used to seed algorithms used in hash functions, encryption, and other cryptographic functions.
However, entropy is also widely used in other areas, most notably in generating unique IDs. These are critical for authentication in particular and are used to generate both “session IDs” as well as the “tokens” that are used in password reset links or in Cross-Site Request Forgery (CSRF) tokens. In all these instances, if the token can be predicted, then the security mechanism can be broken, and a malicious attacker will be able to identify themselves as someone who they are not. These functions therefore require strong sources of entropy.
In general, cybersecurity functions demand rigorous standards of randomness, typically offered only by hardware random number generators. In Linux and other Unix-like operating systems, two sources of entropy are offered via /dev/random and /dev/urandom. special files that serve as pseudorandom number generators. They collect entropy via environmental noise collected from device drivers and other sources, but the quality of the entropy can reduce if there is an insufficient available supply of entropy from the environment, such as if the device is diskless, headless (no mouse/monitor/keyboard) or other sources by which to replenish the entropy. The Linux kernel also provides support for several hardware random number generators, presented at /dev/hwrng, which can provide higher quality entropy if required to support strong cryptographic operations or where entropy sources are absent or cannot replenish the entropy pool sufficiently quickly.
The use of entropy that is only pseudo-random and is subject to cryptanalysis can undermine the strength of guarantees provided by cryptographic functions used in applications such as user authentication and authorization, such as a session ID or in generating the seed for a cryptographic key. In such circumstances, an attacker may be able to predict the ID or cryptographic key and gain access to restricted functionality. This cryptanalysis is possible because weak entropy can leave patterns or clusters of values that are more likely to occur than others. Although some of these require complex techniques to tease out and are largely of theoretical concern, the failures can sometimes be substantial and readily apparent.
One visual image that highlights this is the so-called “ECB penguin” that many people in the cybersecurity field will be familiar with. It is a picture of “Tux,” the Linux penguin mascot character that has been encrypted with a block cipher in ECB mode and still clearly shows the outline of the original. Although this particular instance is a flaw in cipher selection rather than entropy quality, the outcome and principal are the same: weak encryption can sometimes make decryption easier than might be imagined due to patterns in the output.
In addition to weak sources of entropy, randomness can also be undermined by other factors. One relatively well-known issue is that of “entropy pool size”: if a pseudo-random number generator is using a limited entropy source which runs out, then the RNG could produce predictable random numbers, potentially undermining the encryption method or other cryptographic function that is relying on it.
Outside of entropy generation, a frequent problem in randomness is the problem of keyspace size. Even if a truly non-deterministic source of entropy is used, it isn’t much use if the keyspace (set of possible values) is too small. As an example, taken to the extremes for the purposes of illustration, if a set of session IDs is [1,2,3] (i.e. the number “one”, or “two”, or “three”, and no other options) then it doesn’t matter how strong the source of entropy used to generate the session ID for a given user, it is trivially brute-forceable by an attacker because the keyspace is simply too small. This is an extreme example but is fairly commonly seen, in cases such as where hardware devices issued by a manufacturer all have per-device unique passwords set from the factory, but from a weak set that is too small and can easily be brute-forced.
It is only possible to scratch the surface of such a complex topic in a short blog post, but hopefully we’ve managed to touch on all the main points and offered an insight into how and where randomness is important and how it can be best assured where the guarantees of non-determinism are highest.
AppCheck can help you with providing assurance in your entire organisation’s security footprint, by detecting vulnerabilities and enabling organizations to remediate them before attackers are able to exploit them. AppCheck performs comprehensive checks for a massive range of web application and infrastructure vulnerabilities – including missing security patches, exposed network services and default or insecure authentication in place in infrastructure devices.
External vulnerability scanning secures the perimeter of your network from external threats, such as cyber criminals seeking to exploit or disrupt your internet facing infrastructure. Our state-of-the-art external vulnerability scanner can assist in strengthening and bolstering your external networks, which are most-prone to attack due to their ease of access.
The AppCheck Vulnerability Analysis Engine provides detailed rationale behind each finding including a custom narrative to explain the detection methodology, verbose technical detail, and proof of concept evidence through safe exploitation.
AppCheck is a software security vendor based in the UK, offering a leading security scanning platform that automates the discovery of security flaws within organisations websites, applications, network, and cloud infrastructure. AppCheck are authorized by the Common Vulnerabilities and Exposures (CVE) Program as a CVE Numbering Authority (CNA)
As always, if you require any more information on this topic or want to see what unexpected vulnerabilities AppCheck can pick up in your website and applications then please contact us: email@example.com
No software to download or install.
Contact us or call us 0113 887 8380