Is there way to remove not all, but only nested brackets?

bracket.awk:

BEGIN{quote=1}
{
    for(i=1;i<=length;i++){
        ch=substr($0,i,1)
        pr=1
        if(ch=="\""){quote=!quote}
        else if(ch=="[" && quote){brk++;pr=brk<2}
        else if(ch=="]" && quote){brk--;pr=brk<1}
        if(pr){printf "%s",ch}
    }
    print ""
}
$ awk -f bracket.awk file
["q", "0", "R", "L"], ["q", "1", "[", "]"], ["q", "2", "L", "R"], ["q", "3", "R", "L"]

The idea behind it:

Initialize quote=1. Read the file char-wise. Whenever a quote is found, invert quote variable (if 1, it becomes 0, and vice-versa).

Then, brackets are only counted if quote is set to 1 and excess brackets are not printed, according to brk counter.

The print "" statement is just to add a newline, as the printf above does not do it.


With perl:

perl -pe '
   s{([^]["]+|"[^"]*")|\[(?0)*\]}
    {$1 // "[". ($& =~ s/("[^"]*"|[^]["]+)|./$1/gr) . "]"}ge'

That makes use of perl's recursive regexp.

The outer s{regex}{replacement-code}ge tokenises the input into either:

  • any sequence of characters other than [, ] or "
  • a quoted string
  • a [...] group (using recursion in the regexp to find the matching ])

Then, we replace that token with itself if it's in the first two categories ($1), and if not the token with the non-quoted [, ] removed using the same tokenising technique in the inner substitution.

To handle escaped quotes and backslashes within quotes (like "foo\"bar\\"), replace [^"] with (?:[^\\"]|\\.).

With sed

If your sed supports the -E or -r options to work with extended regexps instead of basic ones, you could do it with a loop, replacing the innermost [...]s first:

LC_ALL=C sed -E '
  :1
  s/^(("[^"]*"|[^"])*\[("[^"]*"|[^]"])*)\[(("[^"]*"|[^]["])*)\]/\1\4/
  t1'

(using LC_ALL=C to speed it up and make it equivalent to the perl one which also ignores the user's locale when it comes to interpreting bytes as characters).

POSIXly, it should still be doable with something like:

LC_ALL=C sed '
  :1
  s/^\(\(\("[^"]*"\)*[^"]*\)*\[\(\("[^"]*"\)*[^]"]*\)*\)\[\(\(\("[^"]*"\)*[^]["]*\)*\)\]/\1\6/
  t1'

Here using \(\(a\)*\(b\)*\)* in place of (a|b)* as basic regexps don't have an alternation operator (the BREs of some sed implementations have \| for that, but that's not POSIX/portable).


This gawk is inelegant to say the least, it will break if you even look at it too long, so you don't need to tell me........ just have a quiet and self-satisfied chuckle that you can do better.

But as it more or less works (on Wednesdays and Fridays during months with a J in them) and consumed 20 minutes of my life I am posting it anyway

Schroedinger's awk (Thx @edmorton)

awk -F"\\\], \\\[" '
    {printf "["; 
       for (i=1; i<=NF; i++) {
         cs=split($i,c,","); 
           for (j=1; j<=cs; j++){
             sub("^ *\\[+","",c[j]); sub("\\]+$","",c[j]);
             t=(j==cs)?"]"((i<(NF-1))?", [":""):",";
             printf c[j] t
       }}print ""}' file

["q", "0", "R", "L"], ["q","1", "[", "]"], ["q","2", "L", "R"], ["q","3","R", "L"]

Walkthrough

Split the fields -F on ], [ which needs to be escaped to hell and back in order to get your final element groups in the fields.

Then split on , to get the elements and consume any leading ^[ or trailing ]$ from each element, then re-aggregate the split with , as a separator and finally re-aggregate the fields using a conditional combination of ] and , [.

Heisenberg's sed

If you pipe to sed it's slightly tidier

awk 'BEGIN{FS="\\], \\["}{for (i=1; i<=NF; i++) print $i}' file | 
   sed -E "s/(^| |,)\[+(\")/\1\2/g ;s/\]+(,|$)/\1/g" | 
   awk 'BEGIN{RS=""; FS="\n";OFS="], ["}{$1=$1; print "["$0"]"}'

["q", "0", "R", "L"], ["q", "1", "[", "]"], ["q", "2", "L", "R"], ["q", "3", "R", "L"]

Does the same job as the first version, the first awk splits out the fields as before, sed loses the excess [ and ] and the final awk recomposes the elements by redefining RS, FS and OFS