Bash utility script library

Libraries for bash are out there, but not common. One of the reasons that bash libraries are scarce is due to the limitation of functions. I believe these limitations are best explained on "Greg's Bash Wiki":

Functions. Bash's "functions" have several issues:

  • Code reusability: Bash functions don't return anything; they only produce output streams. Every reasonable method of capturing that stream and either assigning it to a variable or passing it as an argument requires a SubShell, which breaks all assignments to outer scopes. (See also BashFAQ/084 for tricks to retrieve results from a function.) Thus, libraries of reusable functions are not feasible, as you can't ask a function to store its results in a variable whose name is passed as an argument (except by performing eval backflips).

  • Scope: Bash has a simple system of local scope which roughly resembles "dynamic scope" (e.g. Javascript, elisp). Functions see the locals of their callers (like Python's "nonlocal" keyword), but can't access a caller's positional parameters (except through BASH_ARGV if extdebug is enabled). Reusable functions can't be guaranteed free of namespace collisions unless you resort to weird naming rules to make conflicts sufficiently unlikely. This is particularly a problem if implementing functions that expect to be acting upon variable names from frame n-3 which may have been overwritten by your reusable function at n-2. Ksh93 can use the more common lexical scope rules by declaring functions with the "function name { ... }" syntax (Bash can't, but supports this syntax anyway).

  • Closures: In Bash, functions themselves are always global (have "file scope"), so no closures. Function definitions may be nested, but these are not closures, though they look very much the same. Functions are not "passable" (first-class), and there are no anonymous functions (lambdas). In fact, nothing is "passable", especially not arrays. Bash uses strictly call-by-value semantics (magic alias hack excepted).

  • There are many more complications involving: subshells; exported functions; "function collapsing" (functions that define or redefine other functions or themselves); traps (and their inheritance); and the way functions interact with stdio. Don't bite the newbie for not understanding all this. Shell functions are totally f***ed.

Source: http://mywiki.wooledge.org/BashWeaknesses

One example of a shell "library" is /etc/rc.d/functions on Redhat based system. This file contains functions commonly used in sysV init script.


I see some good info and bad info here. Let me share what I know since bash is the primary language I use at work (and we build libraries..). Google has a decent write up on bash scripts in general that I thought was a good read: https://google.github.io/styleguide/shell.xml.

Let me start by saying you should not think of a bash library as you do libraries in other languages. There are certain practices that must be enforced to keep a library in bash simple, organized, and most importantly, reusable.

There is no concept of returning anything from a bash function except for strings that it prints and the function's exit status (0-255). There are expected limitations here and a learning curve especially if you're accustomed to functions of higher-level languages. It can be weird at first, and if you find yourself in a situation where strings just aren't cutting it, you'll want to leverage an external tool such as jq. If jq (or something like it) is available, you can start having your functions print formatted output to be parsed & utilized as you would an object, array, etc.

Function Declarations

There are two ways to declare a function in bash. One operates within your current shell, we'll call is Fx0. And one spawns a subshell to operate in, we'll call that Fx1. Here are examples of how they're declared:

Fx0(){ echo "Hello from $FUNCNAME"; }
Fx1()( echo "Hello from $FUNCNAME" )

These 2 functions perform the same operation - indeed. However, there is a key difference here. Fx1 cannot perform any action that alters the current shell. That means modifying variables, changing shell options and declaring other functions. The latter is what can be exploited to prevent name spacing issues that can easily creep up on you.

# Fx1 cannot change the variable from a subshell
Fx0(){ Fx=0; }
Fx1()( Fx=1 )
Fx=foo; Fx0; echo $Fx
# 0
Fx=foo; Fx1; echo $Fx
# foo

That being said, The only time you should use an "Fx0" kind of function is when you're wanting to redeclare something in the current shell. Always use "Fx1" functions because they are safer and you you don't have to worry about the naming of any functions declared within it. As you can see below, the innocent function is overwritten inside of Fx1, however, it remains unscathed after the execution of Fx1.

innocent_function()(
    echo ":)"
)
Fx1()(
    innocent_function()( true )
    innocent_function
)
Fx1 #prints nothing, just returns true
innocent_function
# :)

This would have (likely) unintended consequences if you had used curly braces. Examples of useful "Fx0" type functions would be specifically for changing the current shell, like so:

use_strict(){
    set -eEu -o pipefail
}
enable_debug(){
    set -Tx
}
disable_debug(){
    set +Tx
}

Regarding Declarations

The use of global variables, or at least those expected to have a value, is bad practice all the way around. As you're building a library in bash, you don't ever want a function to rely on an external variable already being set. Anything the function needs should be supplied to it via the positional parameters. This is the main problem I see in libraries other folks try to build in bash. Even if I find something cool, I can't use it because I don't know the names of the variables I need to have set ahead of time. It leads to digging through all of the code and ultimately just picking out the useful pieces for myself. By far, the best functions to create for a library are extremely small and don't utilize named variables at all, even locally. Take the following for example:

serviceClient()(
    showUsage()(
        echo "This should be a help page"
    ) >&2
    isValidArg()(
        test "$(type -t "$1")" = "function"
    )
    isRunning()(
        nc -zw1 "$(getHostname)" "$(getPortNumber)"
    ) &>/dev/null
    getHostname()(
        echo localhost
    )
    getPortNumber()(
        echo 80
    )
    getStatus()(
        if isRunning
        then echo OK
        else echo DOWN
        fi
    )
    getErrorCount()(
        grep -c "ERROR" /var/log/apache2/error.log
    )
    printDetails()(
        echo "Service status: $(getStatus)"
        echo "Errors logged: $(getErrorCount)"
    )
    if isValidArg "$1"
    then "$1"
    else showUsage
    fi
)

Typically, what you would see near the top is local hostname=localhost and local port_number=80 which is fine, but it is not necessary. It is my opinion that these things should be functional-ized as you're building to prevent future pain when all of a sudden some logic needs to be introduced for getting a value, like: if isHttps; then echo 443; else echo 80; fi. You don't want that kind of logic placed in your main function or else you'll quickly make it ugly and unmanageable. Now, serviceClient has internal functions that get declared upon invocation which adds an unnoticeable amount of overhead to each run. The benefit is now you can have service2Client with functions (or external functions) that are named the same as what serviceClient has with absolutely no conflicts. Another important thing to keep in mind is that redirections can be applied to an entire function upon declaring it. see: isRunning or showUsage This gets as close to object-oriented-ness as I think you should bother using bash.

. serviceClient.sh
serviceClient
# This should be a help page
if serviceClient isRunning
then serviceClient printDetails
fi
# Service status: OK
# Errors logged: 0

I hope this helps my fellow bash hackers out there.

Tags:

Bash