How can I encode and decode percent-encoded strings on the command line?

These commands do what you want:

python -c "import urllib, sys; print urllib.quote(sys.argv[1])" æ
python -c "import urllib, sys; print urllib.unquote(sys.argv[1])" %C3%A6

If you want to encode spaces as +, replace urllib.quote with urllib.quote_plus.

I'm guessing you will want to alias them ;-)


shell

Try the following command line:

$ echo "%C3%A6ndr%C3%BCk" | sed 's@+@ @g;s@%@\\x@g' | xargs -0 printf "%b"
ændrük

You may define it as alias and add it to your shell rc files:

$ alias urldecode='sed "s@+@ @g;s@%@\\\\x@g" | xargs -0 printf "%b"'

Then every time when you need it, simply go with:

$ echo "http%3A%2F%2Fwww" | urldecode
http://www

bash

When scripting, you can use the following syntax:

input="http%3A%2F%2Fwww"
decoded=$(printf '%b' "${input//%/\\x}")

However above syntax won't handle pluses (+) correctly, so you've to replace them with spaces via sed.

You can also use the following urlencode() and urldecode() functions:

urlencode() {
    # urlencode <string>
    local length="${#1}"
    for (( i = 0; i < length; i++ )); do
        local c="${1:i:1}"
        case $c in
            [a-zA-Z0-9.~_-]) printf "$c" ;;
            *) printf '%%%02X' "'$c"
        esac
    done
}

urldecode() {
    # urldecode <string>

    local url_encoded="${1//+/ }"
    printf '%b' "${url_encoded//%/\\x}"
}

Note that your urldecode() assumes the data contains no backslash.


bash + xxd

Bash function with xxd tool:

urlencode() {
  local length="${#1}"
  for (( i = 0; i < length; i++ )); do
    local c="${1:i:1}"
    case $c in
      [a-zA-Z0-9.~_-]) printf "$c" ;;
    *) printf "$c" | xxd -p -c1 | while read x;do printf "%%%s" "$x";done
  esac
done
}

Found in cdown's gist file, also at stackoverflow.


Python

Try to define the following aliases:

alias urldecode='python -c "import sys, urllib as ul; print ul.unquote_plus(sys.argv[1])"'
alias urlencode='python -c "import sys, urllib as ul; print ul.quote_plus(sys.argv[1])"'

Usage:

$ urlencode "ændrük"
C%26ndrC%3Ck
$ urldecode "%C3%A6ndr%C3%BCk"
ændrük

Source: ruslanspivak


PHP

Using PHP you can try the following command:

$ echo oil+and+gas | php -r 'echo urldecode(fgets(STDIN));' // Or: php://stdin
oil and gas

or just:

php -r 'echo urldecode("oil+and+gas");'

Use -R for multiple line input.


Perl

In Perl you can use URI::Escape.

decoded_url=$(perl -MURI::Escape -e 'print uri_unescape($ARGV[0])' "$encoded_url")

Or to process a file:

perl -i -MURI::Escape -e 'print uri_unescape($ARGV[0])' file

sed

Using sed can be achieved by:

cat file | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e

awk

Try anon solution:

awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

See: Using awk printf to urldecode text.


decoding file names

If you need to remove url encoding from the file names, use deurlname tool from renameutils (e.g. deurlname *.*).

See also:

  • Can wget decode uri file names when downloading in batch?
  • How to remove URI encoding from file names?

Related:

  • How to decode URL-encoded string in shell? at SO
  • Decoding URL encoding (percent encoding) at unix SE

Percent-encode reserved URI characters and non-ASCII characters

jq -s -R -r @uri

-s (--slurp) reads input lines into an array and -s -R (--slurp --raw-input) reads the input into a single string. -r (--raw-output) outputs the contents of strings instead of JSON string literals.

Percent-encode all characters

xxd -p|tr -d \\n|sed 's/../%&/g'

tr -d \\n removes the linefeeds that are added by xxd -p after every 60 characters.

Percent-encode all characters except ASCII alphanumeric characters in Bash

eu () {
    local LC_ALL=C c
    while IFS= read -r -n1 -d '' c
    do 
        if [[ $c = [[:alnum:]] ]]
        then 
            printf %s "$c"
        else
            printf %%%02x "'$c"
        fi
    done
}

Without -d '' this would skip linefeeds and null bytes. Without IFS= this would replace characters in IFS with %00. Without LC_ALL=C this would for example replace with %3042 in a UTF-8 locale.