Reasons for using the set.seed function

The need is the possible desire for reproducible results, which may for example come from trying to debug your program, or of course from trying to redo what it does:

These two results we will "never" reproduce as I just asked for something "random":

R> sample(LETTERS, 5)
[1] "K" "N" "R" "Z" "G"
R> sample(LETTERS, 5)
[1] "L" "P" "J" "E" "D"

These two, however, are identical because I set the seed:

R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> set.seed(42); sample(LETTERS, 5)
[1] "X" "Z" "G" "T" "O"
R> 

There is vast literature on all that; Wikipedia is a good start. In essence, these RNGs are called Pseudo Random Number Generators because they are in fact fully algorithmic: given the same seed, you get the same sequence. And that is a feature and not a bug.


Just adding some addition aspects. Need for setting seed: In the academic world, if one claims that his algorithm achieves, say 98.05% performance in one simulation, others need to be able to reproduce it.

?set.seed

Going through the help file of this function, these are some interesting facts:

(1) set.seed() returns NULL, invisible

(2) "Initially, there is no seed; a new one is created from the current time and the process ID when one is required. Hence different sessions will give different simulation results, by default. However, the seed might be restored from a previous session if a previously saved workspace is restored.", this is why you would want to call set.seed() with same integer values the next time you want a same sequence of random sequence.


You have to set seed every time you want to get a reproducible random result.

set.seed(1)
rnorm(4)
set.seed(1)
rnorm(4)

Tags:

Random

R