Can IFS (Internal Field Separator) function as a single separator for multiple consecutive delimiter chars?

From bash manpage :

Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.

It means that IFS whitespace (space, tab and newline) is not treated like the other separators. If you want to get exactly the same behaviour with an alternative separator, you can do some separator swapping with the help of tr or sed :

var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
   el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
   echo "# arr[$x] \"$el\""
done

The %#%#%#%#% thing is a magic value to replace the possible spaces inside the fields, it is expected to be "unique" (or very unlinkely). If you are sure that no space will ever be in the fields, just drop this part).


To remove multiple (non-space) consecutive delimiter chars, two (string/array) parameter expansions can be used. The trick is to set the IFS variable to the empty string for the array parameter expansion.

This is documented in man bash under Word Splitting:

Unquoted implicit null arguments, resulting from the expansion of parameters that have no values, are removed.

(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})

echo ${!arr[*]}

for ((i=0; i < ${#arr[@]}; i++)); do 
   echo "${i}: '${arr[${i}]}'"
done
)

As bash IFS does not provide an in-house way to treat consecutive delimiter chars as a single delimiter (for non-whitespace delimiters), I have put together an all bash version (vs.using an external call eg. tr, awk, sed)

It can handle mult-char IFS..

Here are its execution-time resu;ts, along with similar tests for the tr and awk options shown on this Q/A page... The tests are based on 10000 itterations of just building the arrray (with no I/O )...

pure bash     3.174s (28 char IFS)
call (awk) 0m32.210s  (1 char IFS) 
call (tr)  0m32.178s  (1 char IFS) 

Here is the output

# dlm_str  = :.~!@#$%^&()_+-=`}{][ ";></,
# original = :abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'single*quote?'..123:
# unified  = :abc::::def::::::::::::::::::::::::::::'single*quote?'::123:
# max-w 2^ = ::::::::::::::::
# shrunk.. = :abc:def:'single*quote?':123:
# arr[0] "abc"
# arr[1] "def"
# arr[2] "'single*quote?'"
# arr[3] "123"

Here is the script

#!/bin/bash

# Note: This script modifies the source string. 
#       so work with a copy, if you need the original. 
# also: Use the name varG (Global) it's required by 'shrink_repeat_chars'
#
# NOTE: * asterisk      in IFS causes a regex(?) issue,     but  *  is ok in data. 
# NOTE: ? Question-mark in IFS causes a regex(?) issue,     but  ?  is ok in data. 
# NOTE: 0..9 digits     in IFS causes empty/wacky elements, but they're ok in data.
# NOTE: ' single quote  in IFS; don't know yet,             but  '  is ok in data.
# 
function shrink_repeat_chars () # A 'tr -s' analog
{
  # Shrink repeating occurrences of char
  #
  # $1: A string of delimiters which when consecutively repeated and are       
  #     considered as a shrinkable group. A example is: "   " whitespace delimiter.
  #
  # $varG  A global var which contains the string to be "shrunk".
  #
# echo "# dlm_str  = $1" 
# echo "# original = $varG" 
  dlms="$1"        # arg delimiter string
  dlm1=${dlms:0:1} # 1st delimiter char  
  dlmw=$dlm1       # work delimiter  
  # More than one delimiter char
  # ============================
  # When a delimiter contains more than one char.. ie (different byte` values),    
  # make all delimiter-chars in string $varG the same as the 1st delimiter char.
  ix=1;xx=${#dlms}; 
  while ((ix<xx)) ; do # Where more than one delim char, make all the same in varG  
    varG="${varG//${dlms:$ix:1}/$dlm1}"
    ix=$((ix+1))
  done
# echo "# unified  = $varG" 
  #
  # Binary shrink
  # =============
  # Find the longest required "power of 2' group needed for a binary shrink
  while [[ "$varG" =~ .*$dlmw$dlmw.* ]] ; do dlmw=$dlmw$dlmw; done # double its length
# echo "# max-w 2^ = $dlmw"
  #
  # Shrik groups of delims to a single char
  while [[ ! "$dlmw" == "$dlm1" ]] ; do
    varG=${varG//${dlmw}$dlm1/$dlm1}
    dlmw=${dlmw:$((${#dlmw}/2))}
  done
  varG=${varG//${dlmw}$dlm1/$dlm1}
# echo "# shrunk.. = $varG"
}

# Main
  varG=':abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'\''single*quote?'\''..123:' 
  sfi="$IFS"; IFS=':.~!@#$%^&()_+-=`}{][ ";></,' # save original IFS and set new multi-char IFS
  set -f                                         # disable globbing
  shrink_repeat_chars "$IFS" # The source string name must be $varG
  arr=(${varG:1})    # Strip leading dlim;  A single trailing dlim is ok (strangely
  for ix in ${!arr[*]} ; do  # Dump the array
     echo "# arr[$ix] \"${arr[ix]}\""
  done
  set +f     # re-enable globbing   
  IFS="$sfi" # re-instate the original IFS
  #
exit