Grabbing the extension in a file name

You might simplify matters by just doing pattern matching on the filename rather than extracting the extension twice:

case "$filename" in
    *.tar.bz2) bunzip_then_untar ;;
    *.bz2)     bunzip_only ;;
    *.tar.gz)  untar_with -z ;;
    *.tgz)     untar_with -z ;;
    *.gz)      gunzip_only ;;
    *.zip)     unzip ;;
    *.7z)      do something ;;
    *)         do nothing ;;
esac

If the file name is file-1.0.tar.bz2, the extension is bz2. The method you're using to extract the extension (fileext=${filename##*.}) is perfectly valid¹.

How do you decide that you want the extension to be tar.bz2 and not bz2 or 0.tar.bz2? You need to answer this question first. Then you can figure out what shell command matches your specification.

  • One possible specification is that extensions must begin with a letter. This heuristic fails for a few common extensions like 7z, which might be best treated as a special case. Here's a bash/ksh/zsh implementation:

    basename=$filename; fileext=
    while [[ $basename = ?*.* &&
             ( ${basename##*.} = [A-Za-z]* || ${basename##*.} = 7z ) ]]
    do
      fileext=${basename##*.}.$fileext
      basename=${basename%.*}
    done
    fileext=${fileext%.}
    

    For POSIX portability, you need to use a case statement for pattern matching.

    while case $basename in
            ?*.*) case ${basename##*.} in [A-Za-z]*|7z) true;; *) false;; esac;;
            *) false;;
          esac
    do …
    
  • Another possible specification is that some extensions denote encodings and indicate that further stripping is needed. Here's a bash/ksh/zsh implementation (requiring shopt -s extglob under bash and setopt ksh_glob under zsh):

    basename=$filename
    fileext=
    while [[ $basename = ?*.@(bz2|gz|lzma) ]]; do
      fileext=${basename##*.}.$fileext
      basename=${basename%.*}
    done
    if [[ $basename = ?*.* ]]; then
      fileext=${basename##*.}.$fileext
      basename=${basename%.*}
    fi
    fileext=${fileext%.}
    

    Note that this considers 0 to be an extension in file-1.0.gz.

¹ ${VARIABLE##SUFFIX} and related constructs are in POSIX, so they work in any non-antique Bourne-style shell such as ash, bash, ksh or zsh.


$ echo "thisfile.txt"|awk -F . '{print $NF}'

Comments on this here: http://liquidat.wordpress.com/2007/09/29/short-tip-get-file-extension-in-shell-script/