What's the chance that I'll win a door prize?

MATL, 42 bytes

:<~QXJx`J`tf1Zry0*1b(-tzq]f1=vts3e8<]6L)Ym

This uses a probabilistic (Monte Carlo ) approach. The experiment is run a large number of times, from which the probability is estimated. The number of realizations is selected to ensure that the result is correct up to the fourth decimal with probability at least 90%. However, this takes a very long time and a lot of memory. In the link below the number of realizations has been reduced by a factor of 10⁶ so that the program ends in a reasonable amout of time; and only the first decimal is guaranteed to be accurate with at least 90% probability.

EDIT (July 29, 2016): due to changes in the language, 6L needs to be replaced by 3L. The link below incorporates that modification.

Try it online!

Background

Let p denote the probability to be computed. The experiment described in the challenge will be run for a number n of times. Each time, either you win the prize (“success”) or you don't. Let N be the number of successes. The desired probability can be estimated from N and n. The larger n is, the more accurate the estimation will be. The key question is how to select n to fulfill to the desired accuracy, namely, to assure that at least 90% of times the error will be less than 10⁻⁴.

Monte Carlo methods can be

Fixed-size: a value of n is fixed in advance (and then N is random);
Variable-size: n is determined on the fly by the simulation results.

Among the second category, a common used method is to fix N (desired number of successes) and keep simulating until that number of successes is achieved. Thus n is random. This technique, called inverse binomial sampling or negative-binomial Monte Carlo, has the advantage that the accuracy of the estimator can be bounded. For this reason it will be used here.

Specifically, with negative-binomial Monte Carlo x = (N−1)/(n−1) is an unbiased estimator of p; and the probability that x deviates from p by more than a given ratio can be upper-bounded. According to equation (1) in this paper (note also that the conditions (2) are satisfied), taking N = 2.75·10⁸ or larger ensures that p/x belongs to the interval [1.0001, 0.9999] with at least 90% probability. In particular, this implies that x is correct up to the 4th decimal place with at least 90% probability, as desired.

Code explained

The code uses N = 3e8 to save one byte. Note that doing this many simulations would take a long time. The code in the link uses N = 300, which runs in a more reasonable amount of time (less than 1 minute in the online compiler for the first test cases); but this only assures that the first decimal is correct with probability at least 90%.

:        % Take k implicitly. Range [1 ... k]
<~       % Take n implicitly. Determine if each element in the previous array is
         % less than or equal than n
Q        % Add 1. This gives an array [2 ... 2 1 ... 1]
XJx      % Copy to clipboard J. Delete from stack
`        % Do...while. Each iteration is a Monte Carlo realization, until the 
         % desired nunber of successes is reached
  J      %   Push previously computed array [2 ... 2 1 ... 1]
  `      %   Do...while. Each iteration picks one door and decrements it, until
         %   there is only one
    t    %     Duplicate
    f    %     Indices of non-zero elements of array
    1Zr  %     Choose one of them randomly with uniform distribution
    y0*  %     Copy of array with all values set to 0
    1b(  %     Assign 1 to chosen index
    -    %     Subtract
    tzq  %     Duplicate. Number of nonzero elements minus 1. This is falsy if
         %     there was only one nonzero value; in this case the loop is exited
  ]      %   End do...while
  f1=    %   Index of chosen door. True if it was 1 (success), 0 otherwise
  v      %   Concatenate vertically to results from previous realizations
  ts3e8< %   Duplicate. Is the sum less than 3e8? If so, the loop is exited
]        % End do...while
6L)      % Remove last value (which is always 1)
Ym       % Compute mean. This gives (N-1)/(n-1). Implicitly display

Pyth, 34 bytes

Mc|!*HJ-GHch*J+*tHgGtH*gtGHKh-GHKG

Test suite

Defines a deterministic memoized recursive function g taking n, k as arguments. g 1000 500 returns 0.0018008560286627952 in about 18 seconds (not included in the above test suite because it times out the online interpreter).

An approximate Python 3 translation would be

@memoized
def g(n,k):
    return (not k*(n-k) or (1+(n-k)*((k-1)*g(n,k-1)+g(n-1,k)*(n-k+1)))/(n-k+1))/n

What's the chance that I'll win a door prize?

MATL, 42 bytes

Background

Code explained

Pyth, 34 bytes

Tags:

Math

Code Golf

Probability Theory

Related

Recent Posts