Unable to assign output of nested commands to variable in bash

It's not working because you are attempting to nest unescaped backticks:

VARIA=`head -$((${RANDOM} % `wc -l < file` + 1)) file | tail -1`

That actually attempts to run head -$((${RANDOM} % as a single command first, and that gives you the 2 first errors:

$ VARIA=`head -$((${RANDOM} % `
bash: command substitution: line 1: unexpected EOF while looking for matching `)'
bash: command substitution: line 2: syntax error: unexpected end of file

Then, it tries to run

wc -l < file` + 1)) file | tail -1`

Which means it tries to evaluate + 1)) file | tail -1 (which is between the backticks), and that gives you the next errors:

$ wc -l < file` + 1)) file | tail -1`
bash: command substitution: line 1: syntax error near unexpected token `)'
bash: command substitution: line 1: ` + 1)) file | tail -1'

You can get around this by escaping the backticks:

VARIA=`head -$((${RANDOM} % \`wc -l < file\` + 1)) file | tail -1`

However, as a general rule, it is usually better not to use backticks at all. You should almost always use $() instead. It is more robust and can be nested indefinitely with a simpler syntax:

VARIA=$(head -$((${RANDOM} % $(wc -l < file) + 1)) file | tail -1)

just use this command

VARIA=$(head -n "$((${RANDOM} % $(wc -l < test) + 1))" test | tail -n 1)

to assign the result of a command to a variable we use $(...) (the ancient `...` form is harder to nest).


As another option for reading a random line from a file (and assigning it to a variable), consider a simplified reservoir sampling method, converted from thrig's perl implementation to awk, with Peter.O's seeding improvement:

VARIA=$(awk -v seed=$RANDOM 'BEGIN { srand(seed) } { if (rand() * FNR < 1) { line=$0 } } END { print line }' /usr/share/dict/words)

Here's the awk script, wrapped nicely:

awk -v seed=$RANDOM '
BEGIN { 
  srand(seed) 
}
{ 
  if (rand() * FNR < 1) { 
    line=$0
  } 
}
END { 
  print line 
}' /usr/share/dict/words

Because of the way awk's srand() works, you would get the same value if you run this script within the same second unless you seed it with something else random; here I've passed in bash's $RANDOM as the seed. Here I'm selecting words from /usr/share/dict/words, just as a source of text.

This method does not care how many lines are in the file (my local copy has 479,828 lines), so it should be pretty flexible.

To see the program's math in action, I wrote up a wrapper script that iterates through different line numbers and probabilities:

demo.sh

#!/bin/sh

for lineno in 1 2 3 4 5 20 100
do
  echo "0 .. 0.99999 < ( 1 / FNR == " $(printf 'scale=2\n1 / %d\n' "$lineno" | bc) ")"
  for r in 0 0.01 0.25 0.5 0.99
  do
    result=$(printf '%f * %d\n' "$r" "$lineno" | bc)
    case $result in
      (0*|\.*) echo "Line $lineno: Result of probability $r * line $lineno is $result and is < 1, choosing line" ;;
      (*)      echo "Line $lineno: Result of probability $r * line $lineno is $result and is >= 1, not choosing line" ;;
    esac
  done
  echo
done

The results are:

0 .. 0.99999 < ( 1 / FNR ==  1.00 )
Line 1: Result of probability 0 * line 1 is 0 and is < 1, choosing line
Line 1: Result of probability 0.01 * line 1 is .010000 and is < 1, choosing line
Line 1: Result of probability 0.25 * line 1 is .250000 and is < 1, choosing line
Line 1: Result of probability 0.5 * line 1 is .500000 and is < 1, choosing line
Line 1: Result of probability 0.99 * line 1 is .990000 and is < 1, choosing line

0 .. 0.99999 < ( 1 / FNR ==  .50 )
Line 2: Result of probability 0 * line 2 is 0 and is < 1, choosing line
Line 2: Result of probability 0.01 * line 2 is .020000 and is < 1, choosing line
Line 2: Result of probability 0.25 * line 2 is .500000 and is < 1, choosing line
Line 2: Result of probability 0.5 * line 2 is 1.000000 and is >= 1, not choosing line
Line 2: Result of probability 0.99 * line 2 is 1.980000 and is >= 1, not choosing line

0 .. 0.99999 < ( 1 / FNR ==  .33 )
Line 3: Result of probability 0 * line 3 is 0 and is < 1, choosing line
Line 3: Result of probability 0.01 * line 3 is .030000 and is < 1, choosing line
Line 3: Result of probability 0.25 * line 3 is .750000 and is < 1, choosing line
Line 3: Result of probability 0.5 * line 3 is 1.500000 and is >= 1, not choosing line
Line 3: Result of probability 0.99 * line 3 is 2.970000 and is >= 1, not choosing line

0 .. 0.99999 < ( 1 / FNR ==  .25 )
Line 4: Result of probability 0 * line 4 is 0 and is < 1, choosing line
Line 4: Result of probability 0.01 * line 4 is .040000 and is < 1, choosing line
Line 4: Result of probability 0.25 * line 4 is 1.000000 and is >= 1, not choosing line
Line 4: Result of probability 0.5 * line 4 is 2.000000 and is >= 1, not choosing line
Line 4: Result of probability 0.99 * line 4 is 3.960000 and is >= 1, not choosing line

0 .. 0.99999 < ( 1 / FNR ==  .20 )
Line 5: Result of probability 0 * line 5 is 0 and is < 1, choosing line
Line 5: Result of probability 0.01 * line 5 is .050000 and is < 1, choosing line
Line 5: Result of probability 0.25 * line 5 is 1.250000 and is >= 1, not choosing line
Line 5: Result of probability 0.5 * line 5 is 2.500000 and is >= 1, not choosing line
Line 5: Result of probability 0.99 * line 5 is 4.950000 and is >= 1, not choosing line

0 .. 0.99999 < ( 1 / FNR ==  .05 )
Line 20: Result of probability 0 * line 20 is 0 and is < 1, choosing line
Line 20: Result of probability 0.01 * line 20 is .200000 and is < 1, choosing line
Line 20: Result of probability 0.25 * line 20 is 5.000000 and is >= 1, not choosing line
Line 20: Result of probability 0.5 * line 20 is 10.000000 and is >= 1, not choosing line
Line 20: Result of probability 0.99 * line 20 is 19.800000 and is >= 1, not choosing line

0 .. 0.99999 < ( 1 / FNR ==  .01 )
Line 100: Result of probability 0 * line 100 is 0 and is < 1, choosing line
Line 100: Result of probability 0.01 * line 100 is 1.000000 and is >= 1, not choosing line
Line 100: Result of probability 0.25 * line 100 is 25.000000 and is >= 1, not choosing line
Line 100: Result of probability 0.5 * line 100 is 50.000000 and is >= 1, not choosing line
Line 100: Result of probability 0.99 * line 100 is 99.000000 and is >= 1, not choosing line

The original formula:

rand() * FNR < 1

can be mathematically rewritten as:

rand() < 1 / FNR

... which is more intuitive to me, as it demonstrates the decreasing values on the right-hand side as the line numbers go up. As the values on the right side of the equation go down, there's a smaller and smaller chance that the rand() function will return a value that's less than the right-hand side.

For each line number, I print a representation of the formula that will be tested: the range of rand()'s output and "1 divided by the line number". I then iterate through some sample random values to see whether the line would be chosen given that random value.

A few sample cases are interesting to look at:

  • on line 1, since rand() generates values in the range 0 <= rand() < 1, the result will always be less than (1 / 1 == 1), so line 1 will always be chosen.
  • on line 2, you can see that the random value needs to be less than 0.50, indicating a 50% chance of choosing line 2.
  • on line 100, rand() now needs to generate a value less than 0.01 in order for the line to be chosen.