Generating Random Strings

Your performance problem comes from using the random package in the first place: it's understandable that you could find the random::randomStrings() function in an internet search and think it's a good way to generate random strings for use in a program, but the random package is not intended for general-purpose programming. It works by querying the RANDOM.ORG server, which is intrinsically slower than R's built-in pseudo-random number generators.

From one of the vignettes from the random package:

There are a number of situations in which it is desirable to use non-deterministically determined random numbers. Examples include
- to seed distributed computing on different nodes with truly indepedent seeds;
- to obtain portable initializations for RNGs that do not depend on particular operating system or hardware features;
- to validate simulation results using non-deterministic random numbers;
- to provide indeterministic seeds used for lottery drawings or games ...

Note that most of these examples are about seeding or initializing (these are synonyms) R's built-in pseudo-random number generators, rather than replacing them ...


Using "stringi" as suggested by @akrun will be faster, but the following is also very fast and does not require any additional packages:

myFun <- function(n = 5000) {
  a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
  paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
}

Example output:

myFun(10)
##  [1] "BZHOF3737P" "EPOWI0674X" "YYWEB2825M" "HQIXJ5187K" "IYIMB2578R"
##  [6] "YSGBG6609I" "OBLBL6409Q" "PUMAL5632D" "ABRAT4481L" "FNVEN7870Q"

We can use stri_rand_strings from stringi

library(stringi)
sprintf("%s%s%s", stri_rand_strings(5, 5, '[A-Z]'),
      stri_rand_strings(5, 4, '[0-9]'), stri_rand_strings(5, 1, '[A-Z]'))

Or more compactly

do.call(paste0, Map(stri_rand_strings, n=5, length=c(5, 4, 1),
            pattern = c('[A-Z]', '[0-9]', '[A-Z]')))

Benchmarks

system.time({
    do.call(paste0, Map(stri_rand_strings, n=5000, length=c(5, 4, 1),
            pattern = c('[A-Z]', '[0-9]', '[A-Z]')))
    })
#  user  system elapsed 
#   0      0      0

Was able to reproduce the timings even for one part of the expected output using OP's method

system.time(string_5 <- as.vector(randomStrings(n=5000, len=5, digits=FALSE, upperalpha=TRUE,
                                              loweralpha=FALSE, unique=TRUE, check=TRUE)))
#  user  system elapsed 
#   0.86    0.24    5.52 

You can directly perform what you want: Sample random 5 capital letters Sample 4 digits Sample 1 random capital letter

digits = 0:9
createRandString<- function() {
  v = c(sample(LETTERS, 5, replace = TRUE),
        sample(digits, 4, replace = TRUE),
        sample(LETTERS, 1, replace = TRUE))
  return(paste0(v,collapse = ""))
}

This will be more easily controlled, and won't take as long.

Tags:

Random

R