Separate numbers, strings from one line using bash

GNU grep or compatible solution:

s="string123anotherstr456thenanotherstr789"
grep -Eo '[[:alpha:]]+|[0-9]+' <<<"$s"
  • [[:alpha:]]+|[0-9]+ - regex alternation group, matches either alphabetic character(s) or number(s); both will be considered as separate entries on output

The output:

string
123
anotherstr
456
thenanotherstr
789

POSIXly:

string=string123anotherstr456thenanotherstr789
sed '
  s/[^[:alnum:]]//g; # remove anything other than letters and numbers
  s/[[:alpha:]]\{1,\}/&\
/g; # insert a newline after each sequence of letters
  s/[0-9]\{1,\}/&\
/g; # same for digits
  s/\n$//; # remove a trailing newline if any' << EOF
$string
EOF

awk

Input contains only letters and numerals

Add a newline character after every [[:alpha:]]+ (sequence of letters) and after every [[:digit:]]+ (sequence of numerals):

awk '{ gsub(/([[:alpha:]]+|[[:digit:]]+)/,"&\n",$0) ; printf $0 }' filename

(The & is awk shorthand for the matched sequence.)


Input contains other characters (eg, punctuation)

As before, but now also dealing with substrings of [^[:alnum:]]+ (non-letter, non-numeral) characters:

awk '{ gsub(/([[:alpha:]]+|[[:digit:]]+|[^[:alnum:]]+)/,"&\n",$0) ; printf $0 }' filename

Negative numbers and decimal fractions

Treat - (hyphen) and . (period) as numbers:

awk '{ gsub(/([[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+)/,"&\n",$0) ; printf $0 }' filename

Those characters must appear in both the [[:digit:].-]+ and [^[:alnum:].-]+ expressions. Also, to be interpreted as a literal hyphen, the - must be last character before the final right square bracket of each expression; otherwise, it indicates a range of characters.

Example:

[test]$ cat file.txt 
string123another!!str456.001thenanotherstr-789

[test]$ awk '{ gsub(/([[:alpha:]]+|[[:digit:].-]+|[^[:alnum:].-]+)/,"&\n",$0) ; printf $0 }' file.txt 
string
123
another
!!
str
456.001
thenanotherstr
-789

An exercise for the reader

If the input file requires it, you could modify the awk command to:

  • Ensure that - only counts as part of a number if it occurs at the start of a numeral sequence.
  • Allow numbers that are expressed in scientific notation.