Transliteration script for linux shell

Using Awk:

#!/usr/bin/awk -f
BEGIN {
    FS = OFS = ""
    table["a"] = "e"
    table["x"] = "ch"
    # and so on...
}
{
    for (i = 1; i <= NF; ++i) {
        if ($i in table) {
            $i = table[$i]
        }
    }
}
1

Usage:

awk -f script.awk file

Test:

# echo "the quick brown fox jumps over the lazy dog" | awk -f script.awk
the quick brown foch jumps over the lezy dog

Not an answer, just to show a briefer, idiomatic way to populate the table[] array from @konsolebox's answer as discussed in the related comments:

BEGIN {
    split("a  e b", old)
    split("x ch o", new)
    for (i in old)
        table[old[i]] = new[i]
    FS = OFS = ""
}

so the mapping of old to new chars is clearly shown in that the char in the first split() is mapped to the char(s) below it and for any other mapping you want you just need to change the string(s) in the split(), not change 26-ish explicit assignments to table[].

You can even create a general script to do mappings and just pass in the old and new strings as variables:

BEGIN {
    split(o, old)
    split(n, new)
    for (i in old)
        table[old[i]] = new[i]
    FS = OFS = ""
}

then in shell anything like this:

old="a  e b"
new="x ch o"
awk -v o="$old" -v b="$new" -f script.awk file

and you can protect yourself from your own mistakes populating the strings, e.g.:

BEGIN {
    numOld = split(o, old)
    numNew = split(n, new)

    if (numOld != numNew) {
        printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
        exit 1
    }

    for (i=1; i <= numOld; i++) {
        if (old[i] in table) {
            printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
            exit 1
        }
        if (newvals[new[i]]++) {
            printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
        }
        table[old[i]] = new[i]
    }
}

Wouldn't it be good to know if you wrote that b maps to x and then later mistakenly wrote that b maps to y? The above really is the best way to do this but your call of course.

Here's one complete solution as discussed in the comments below

BEGIN {
    numOld = split("a  e b", old)
    numNew = split("x ch o", new)

    if (numOld != numNew) {
        printf "ERROR: #old vals (%d) != #new vals (%d)\n", numOld, numNew | "cat>&1"
        exit 1
    }

    for (i=1; i <= numOld; i++) {
        if (old[i] in table) {
            printf "ERROR: \"%s\" duplicated at position %d in old string\n", old[i], i | "cat>&2"
            exit 1
        }
        if (newvals[new[i]]++) {
            printf "WARNING: \"%s\" duplicated at position %d in new string\n", new[i], i | "cat>&2"
        }
        map[old[i]] = new[i]
    }

    FS = OFS = ""
}
{
    for (i = 1; i <= NF; ++i) {
        if ($i in map) {
            $i = map[$i]
        }
    }
    print
}

I renamed the table array as map just because iMHO that better represents the purpose of the array.

save the above in a file script.awk and run it as awk -f script.awk inputfile


This can be done quite concisely using a Perl one-liner:

perl -pe '%h=(a=>"xy",c=>"z"); s/(.)/defined $h{$1} ? $h{$1} : $1/eg'

or equivalently (thanks jaypal):

perl -pe '%h=(a=>"xy",c=>"z"); s|(.)|$h{$1}//=$1|eg'

%h is a hash containing the characters (keys) and their substitutions (values). s is the substitution command (as in sed). The g modifier means that the substitution is global and the e means that the replacement part is evaluated as an expression. It captures each character one by one and substitutes them with the value in the hash if it exists, otherwise keeps the original value. The -p switch means that each line in the input is automatically printed.

Testing it out:

$ perl -pe '%h=(a=>"xy",c=>"z"); s|(.)|$h{$1}//=$1|eg' <<<"abc"
xybz

Tags:

Linux

Shell

Sed

Tr