Predicting Math.random() numbers?

Indeed, Math.random() is not cryptographically secure.


Definition of Math.random()

The definition of Math.random() in the ES6 specification left a lot of freedom about the implementation of the function in JavaScript engines:

Returns a Number value with positive sign, greater than or equal to 0 but less than 1, chosen randomly or pseudo randomly with approximately uniform distribution over that range, using an implementation-dependent algorithm or strategy. This function takes no arguments.

Each Math.random function created for distinct code Realms must produce a distinct sequence of values from successive calls.

So let's have a look at how the most popular JavaScript engines implemented it.


  • SpiderMonkey, used by Firefox and many programs, implemented an algorithm named Xorshift128+ (link to Mozilla's repository).
  • V8, used by Chrome and Node.js, also implemented the Xorshift128+ algorithm (called in the RandomNumberGenerator function)
  • Webkit, used by Safari, also implemented the Xorshift128+ algorithm.
  • Chakra, the JavaScript engine powering Microsoft Edge, also implemented the Xorshift128+ algorithm.

Xorshift128+ is one of the XorShift random number generators, which are among the fastest non-cryptographically-secure random number generators.

I don't know if there's any attack on any of the implementations listed above, though. But those implementations are very recent, and other implementations (and vulnerabilities) existed in the past, and may still exist if your browser / server haven't been updated.

Update: douggard's answer explains how someone can recover the state XorShift128+ and predict Math.random() values.


V8's MWC1616 algorithm

On November 2015, Mike Malone explained in a blog post that V8's implementation of the MWC1616 algorithm was somehow broken: you can see some linear patterns on this test or on this one if you're using a V8-based browser. The V8 team handled it and released a fix in Chromium 49 (on January 15th, 2016) and Chrome 49 (on March 8th, 2016).

This paper pulished in 2009 explained how to determine the state of the PRNG of V8's based on the previous outputs of Math.random() (the MWC1616 version).

Here's a Python script which implements it (even if the outputs are not consecutive).

This has been exploited in a real world attack on CSGOJackbot, a betting site built with Node.js. The attacker was fair enough to just make fun of this vulnerability.


Lack of compartmentalization

Before ES6, the Math.random() definition didn't specify that distinct pages had to produce distinct sequences of values.

This allowed an attacker to generate some random numbers, determine the state of the PNRG, redirect the user to a vulnerable application (which would use Math.random() for sensitive things) and predict which number Math.random() was going to return. This blog post presents some code about how to do it (Internet Explorer 8 and below).

The ES6 specification (which had been approved as a standard on June 17, 2015) makes sure that browsers handle this case correctly.


Badly-chosen seed

Guessing the seed chosen for initializing the sequence can also allow an attacker to predict the numbers in the sequence. It's also a real world scenario, since it has been used on Facebook in 2012.


This paper published in 2008 explains different methods to leak some information thanks to the browsers' lack of randomness.


Solutions

First of all, always make sure that your browsers / servers are updated regularly.

Then, you should use cryptographic functions if needed:

  • If you're working in a browser environment, then you can use crypto.getRandomValues, part of the Web Crypto API (check the support table).
  • If you're working with Node.js, then you can use crypto.randomBytes.

Both rely on OS-level entropy, and will let you get cryptographically random values.


You can attack these using the Z3 theorem prover. I've implemented such an attack in Python in order to predict values in a lottery simulator.

As mentioned previously, XorShift128+ is used in most places now, so that is what we are attacking. You begin by implementing the normal algorithm so you can understand it.

def xs128p(state0, state1):
    s1 = state0 & 0xFFFFFFFFFFFFFFFF
    s0 = state1 & 0xFFFFFFFFFFFFFFFF
    s1 ^= (s1 << 23) & 0xFFFFFFFFFFFFFFFF
    s1 ^= (s1 >> 17) & 0xFFFFFFFFFFFFFFFF
    s1 ^= s0 & 0xFFFFFFFFFFFFFFFF
    s1 ^= (s0 >> 26) & 0xFFFFFFFFFFFFFFFF 
    state0 = state1 & 0xFFFFFFFFFFFFFFFF
    state1 = s1 & 0xFFFFFFFFFFFFFFFF
    generated = (state0 + state1) & 0xFFFFFFFFFFFFFFFF

    return state0, state1, generated

The algorithm takes in two state variables, XORs and shifts them around, then returns the sum of the updated state variables. What is also important is how each engine takes the uint64 returned and converts it to a double. I found this info by digging through each implementation's source code.

# Firefox (SpiderMonkey) nextDouble():
# (rand_uint64 & ((1 << 53) - 1)) / (1 << 53)

# Chrome (V8) nextDouble():
# ((rand_uint64 & ((1 << 52) - 1)) | 0x3FF0000000000000) - 1.0

# Safari (WebKit) weakRandom.get():
# (rand_uint64 & ((1 << 53) - 1) * (1.0 / (1 << 53)))

Each is a little different. You then can take the doubles produced by Math.random() and recover some lower bits of the uint64 produced by the algorithms.

Next is implementing the code in Z3 so that it can be symbolically executed and the state can be solved for. See the Github link for more context. It looks pretty similar to the normal code except that you tell the solver that the lower bits must equal the recovered lower bits from the browser.

def sym_xs128p(slvr, sym_state0, sym_state1, generated, browser):
    s1 = sym_state0 
    s0 = sym_state1 
    s1 ^= (s1 << 23)
    s1 ^= LShR(s1, 17)
    s1 ^= s0
    s1 ^= LShR(s0, 26) 
    sym_state0 = sym_state1
    sym_state1 = s1
    calc = (sym_state0 + sym_state1)

    condition = Bool('c%d' % int(generated * random.random()))
    if browser == 'chrome':
        impl = Implies(condition, (calc & 0xFFFFFFFFFFFFF) == int(generated))
    elif browser == 'firefox' or browser == 'safari':
        # Firefox and Safari save an extra bit
        impl = Implies(condition, (calc & 0x1FFFFFFFFFFFFF) == int(generated))

    slvr.add(impl)
    return sym_state0, sym_state1, [condition]

If you supply 3 consecutively generated doubles to Z3 it should be able to recover your state. Below is a snippet from the main function. It is calling the symbolically executed XorShift128+ algorithm on two of Z3's 64 bit integers (the unknown state variables), providing the lower (52 or 53) bits from the recovered uint64s.

If that is successful the solver will return SATISFIABLE and you can get the state variables that it solved for.

    for ea in xrange(3):
        sym_state0, sym_state1, ret_conditions = sym_xs128p(slvr, sym_state0, sym_state1, generated[ea], browser)
        conditions += ret_conditions

    if slvr.check(conditions) == sat:
        # get a solved state
        m = slvr.model()
        state0 = m[ostate0].as_long()
        state1 = m[ostate1].as_long()

There is a slightly more detailed writeup here that focuses on using this method to predict winning lottery numbers in a powerball simulator.


Math.random (and similar other functions) start from a seed and create a new number. It seems random to the user because of course the algorithm is tuned in such a way that it appears so. But the matter of fact is that there is no real source of randomness anywhere. If you know the internal state of the generator (which is 100% deterministic software and nothing special at all), you know all future (and depending on the algorithm maybe also past) numbers generated by it.

A "real" random source would be stuff like measuring decay of a radioactive particle, or in more real-world terms, any kind of electrical white noise, or more practically, things like the user moving the mouse around or minute deviations between key presses and such things. Nothing whatsoever is in Math.random.

Math.random is (like the random function of most languages/libraries) designed in this way, and the property that you can recover a string of "random" numbers from the same seed is actually a useful feature in many cases. Just not for security.