Most simple way of extracting substring in Unix shell?

cut might be useful:

$ echo hello | cut -c1,3
hl
$ echo hello | cut -c1-3
hel
$ echo hello | cut -c1-4
hell
$ echo hello | cut -c4-5
lo

Shell Builtins are good for this too, here is a sample script:

#!/bin/bash
# Demonstrates shells built in ability to split stuff.  Saves on
# using sed and awk in shell scripts. Can help performance.

shopt -o nounset
declare -rx       FILENAME=payroll_2007-06-12.txt

# Splits
declare -rx   NAME_PORTION=${FILENAME%.*}     # Left of .
declare -rx      EXTENSION=${FILENAME#*.}     # Right of .
declare -rx           NAME=${NAME_PORTION%_*} # Left of _
declare -rx           DATE=${NAME_PORTION#*_} # Right of _
declare -rx     YEAR_MONTH=${DATE%-*}         # Left of _
declare -rx           YEAR=${YEAR_MONTH%-*}   # Left of _
declare -rx          MONTH=${YEAR_MONTH#*-}   # Left of _
declare -rx            DAY=${DATE##*-}        # Left of _

clear

echo "  Variable: (${FILENAME})"
echo "  Filename: (${NAME_PORTION})"
echo " Extension: (${EXTENSION})"
echo "      Name: (${NAME})"
echo "      Date: (${DATE})"
echo "Year/Month: (${YEAR_MONTH})"
echo "      Year: (${YEAR})"
echo "     Month: (${MONTH})"
echo "       Day: (${DAY})"

That outputs:

  Variable: (payroll_2007-06-12.txt)
  Filename: (payroll_2007-06-12)
 Extension: (txt)
      Name: (payroll)
      Date: (2007-06-12)
Year/Month: (2007-06)
      Year: (2007)
     Month: (06)
       Day: (12)

And as per Gnudif above, there are always sed/awk/perl for when the going gets really tough.


Unix shells do not traditionally have regex support built-in. Bash and Zsh both do, so if you use the =~ operator to compare a string to a regex, then:

You can get the substrings from the $BASH_REMATCH array in bash.

In Zsh, if the BASH_REMATCH shell option is set, the value is in the $BASH_REMATCH array, else it's in the $MATCH/$match tied pair of variables (one scalar, the other an array). If the RE_MATCH_PCRE option is set, then the PCRE engine is used, else the system regexp libraries, for an extended regexp syntax match, as per bash.

So, most simply: if you're using bash:

if [[ "$variable" =~ unquoted.*regex ]]; then
  matched_portion="${BASH_REMATCH[0]}"
  first_substring="${BASH_REMATCH[1]}"
fi

If you're not using Bash or Zsh, it gets more complicated as you need to use external commands.


Consider also /usr/bin/expr.

$ expr substr hello 2 3
ell

You can also match patterns against the beginning of strings.

$ expr match hello h
1

$ expr match hello hell
4

$ expr match hello e
0

$ expr match hello 'h.*o'
5

$ expr match hello 'h.*l'
4

$ expr match hello 'h.*e'
2

Tags:

Unix

Shell

Regex