sed - remove the very last occurrence of a string (a comma) in a file?

Using awk

If the comma is always at the end of the second to last line:

$ awk 'NR>2{print a;} {a=b; b=$0} END{sub(/,$/, "", a); print a;print b;}'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using awk and bash

$ awk -v "line=$(($(wc -l <input)-1))" 'NR==line{sub(/,$/, "")} 1'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

Using sed

$ sed 'x;${s/,$//;p;x;};1d'  input
[11911,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11912,0,"BUILDER","2014-10-15","BUILDER",0,0],
[11913,0,"BUILDER","2014-10-15","BUILDER",0,0]
]

For OSX and other BSD platforms, try:

sed -e x -e '$ {s/,$//;p;x;}' -e 1d  input

Using bash

while IFS=  read -r line
do
    [ "$a" ] && printf "%s\n" "$a"
    a=$b
    b=$line
done <input
printf "%s\n" "${a%,}"
printf "%s\n" "$b"

lcomma() { sed '
    $x;$G;/\(.*\),/!H;//!{$!d
};  $!x;$s//\1/;s/^\n//'
}

That should remove only the last occurrence of a , in any input file - and it will still print those in which a , does not occur. Basically, it buffers sequences of lines that do not contain a comma.

When it encounters a comma it swaps the current line buffer with the hold buffer and in that way simultaneously prints out all lines that occurred since the last comma and frees its hold buffer.

I was just digging through my history file and found this:

lmatch(){ set "USAGE:\
        lmatch /BRE [-(((s|-sub) BRE)|(r|-ref)) REPL [-(f|-flag) FLAG]*]*
"       "${1%"${1#?}"}" "$@"
        eval "${ZSH_VERSION:+emulate sh}"; eval '
        sed "   1x;     \\$3$2!{1!H;\$!d
                };      \\$3$2{x;1!p;\$!d;x
                };      \\$3$2!x;\\$3$2!b'"
        $(      unset h;i=3 p=:-:shfr e='\033[' m=$(($#+1)) f=OPTERR
                [ -t 2 ] && f=$e\2K$e'1;41;17m}\r${h-'$f$e\0m
                f='\${$m?"\"${h-'$f':\t\${$i$e\n}\$1\""}\\c' e=} _o=
                o(){    IFS=\ ;getopts  $p a "$1"       &&
                        [ -n "${a#[?:]}" ]              &&
                        o=${a#-}${OPTARG-${1#-?}}       ||
                        ! eval "o=$f;o=\${o%%*\{$m\}*}"
        };      a(){    case ${a#[!-]}$o in (?|-*) a=;;esac; o=
                        set $* "${3-$2$}{$((i+=!${#a}))${a:+#-?}}"\
                                ${3+$2 "{$((i+=1))$e"} $2
                        IFS=$;  _o=${_o%"${3+$_o} "*}$*\
        };      while   eval "o \"\${$((i+=(OPTIND=1)))}\""
                do      case            ${o#[!$a]}      in
                        (s*|ub)         a s 2 ''        ;;
                        (r*|ef)         a s 2           ;;
                        (f*|lag)        a               ;;
                        (h*|elp)        h= o; break     ;;
                esac;   done;   set -f; printf  "\t%b\n\t" $o $_o
)\"";}

It's actually pretty good. Yes, it uses eval, but it never passes anything to it beyond a numeric reference to its arguments. It builds arbitrary sed scripts for handling a last match. I'll show you:

printf "%d\" %d' %d\" %d'\n" $(seq 5 5 200) |                               
    tee /dev/fd/2 |                                                         
    lmatch  d^.0     \  #all re's delimit w/ d now                           
        -r '&&&&'    \  #-r or --ref like: '...s//$ref/...'      
        --sub \' sq  \  #-s or --sub like: '...s/$arg1/$arg2/...'
        --flag 4     \  #-f or --flag appended to last -r or -s
        -s\" \\dq    \  #short opts can be '-s $arg1 $arg2' or '-r$arg1'
        -fg             #tacked on so: '...s/"/dq/g...'                     

That prints the following to stderr. This is a copy of lmatch's input:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
105" 110' 115" 120'
125" 130' 135" 140'
145" 150' 155" 160'
165" 170' 175" 180'
185" 190' 195" 200'

The function's evaled subshell iterates through all of its arguments once. As it walks over them it iterates a counter appropriately depending on the context for each switch and skips over that many arguments for the next iteration. From then on it does one of a few things per argument:

  • For each option the option parser adds $a to $o. $a is assigned based on the value of $i which is incremented by arg count for each arg processed. $a is assigned one of the two following values:
    • a=$((i+=1)) - this is assigned if either a short-option does not have its argument appended to it or if the option was a long one.
    • a=$i#-? - this is assigned if the option is a short one and does have its arg appended to it.
    • a=\${$a}${1:+$d\${$(($1))\}} - Regardless of the initial assignment, $a's value is always wrapped in braces and - in an -s case - sometimes $i is incremented one more and additionally delimited field is appended.

The result is that eval is never passed a string containing any unknowns. Each of the command-line arguments are referred to by their numeric argument number - even the delimiter which is extracted from the first character of the first argument and is the only time you should use whatever character that is unescaped. Basically, the function is a macro generator - it never interprets the arguments' values in any special way because sed can (and will, of course) easily handle that when it parses the script. Instead, it just sensibly arranges its args into a workable script.

Here's some debug output of the function at work:

... sed "   1x;\\$2$1!{1!H;\$!d
        };      \\$2$1{x;1!p;\$!d;x
        };      \\$2$1!x;\\$2$1!b
        s$1$1${4}$1
        s$1${6}$1${7}$1${9}
        s$1${10#-?}$1${11}$1${12#-?}
        "
++ sed '        1x;\d^.0d!{1!H;$!d
        };      \d^.0d{x;1!p;$!d;x
        };      \d^.0d!x;\d^.0d!b
        sdd&&&&d
        sd'\''dsqd4
        sd"d\dqdg
        '

And so lmatch can be used to easily apply regexes to data following the last match in a file. The result of the command I ran above is:

5" 10' 15" 20'
25" 30' 35" 40'
45" 50' 55" 60'
65" 70' 75" 80'
85" 90' 95" 100'
101010105dq 110' 115dq 120'
125dq 130' 135dq 140sq
145dq 150' 155dq 160'
165dq 170' 175dq 180'
185dq 190' 195dq 200'

...which, given the subset of the file input that follows the last time /^.0/ is matched, applies the following substitutions:

  • sdd&&&&d - replaces $match with itself 4 times.
  • sd'dsqd4 - the fourth single-quote following the beginning of the line since the last match.
  • sd"d\dqd2 - ditto, but for double-quotes and globally.

And so, to demonstrate how one might use lmatch to remove the last comma in a file:

printf "%d, %d %d, %d\n" $(seq 5 5 100) |
lmatch '/\(.*\),' -r\\1

OUTPUT:

5, 10 15, 20
25, 30 35, 40
45, 50 55, 60
65, 70 75, 80
85, 90 95 100

Simply you could try the below Perl one-liner command.

perl -00pe 's/,(?!.*,)//s' file

Explanation:

  • , Matches a comma.
  • (?!.*,) Negative lookahead asserts that there wouldn't be a comma after that matched comma. So it would match the last comma.
  • s And the most importing thing is s DOTALL modifier which makes dot to match even newline characters also.

Tags:

Sed