Balance the Brackets

Retina, 254 252 264 248 240 232 267 bytes

Thank you to @AnthonyPham, @officialaimm, and @MistahFiggins for pointing out bugs

T`[]()`:;'"
+`'-*"|:-*;|{-*}|<-*>
-
+`'(\W+)"|:(\W+);|{(\W+)}|<(\W+)>
A$1$2$3$+B
+`'(\D+)"|:(\D+);|{(\D+)}|<(\D+)>
6$1$2$3$+9
(.*)(}{|"'|;:|><)
1$1
-

A6B9|6A9B
1
A6+B9+|A6+.B9+.|A+6.B+9
11
T`':{";}`<<<>
(.*)(<\W|\W>)
1$1
+`<(.*A.*B.*)?\W|\W(.*A.*B.*)?>
1$1$2
\W|6B|1

Try it Online!

Non-brute force solution! It works for all test cases, and even found an error in one.

-2 bytes thanks to @MartinEnder (${4} to $+)

+12 bytes to account for additional swapping cases

-16 bytes by making better use of character classes

-8 bytes by removing an unnecessary restriction on swapping. This also fixed a bug :)

-10 bytes by combining the swapping logic into a single regex

+2 bytes to account for consecutive swaps

+many for various bug fixes**

Explanation:

T`[]()`:;'" is used to replace special bracket types for convenience. First, we recursively replace all matched brackets with -, AB or 69 depending on whether they are adjacent or not.

Then, useful "swapping " is performed by removing newly matched brackets and adding a 1 to the beginning of the string. We also replace - with the empty string, as it was just being used for the above swapping.

Next, we try "replacements" by removing pairs of unmatched brackets that don't overlap already-matched brackets and adding a 1 to the string.

Finally, \W|6B|1 counts any remaining single brackets plus the number of 1s.

**I'm currently working on a shorter version that uses Retina's line splitting features, though I ran into a considerable problem so it might take quite awhile.


Brain-Flak, 1350 bytes

{({}(())(<>))<>({(()()()())<{({}[()])<>}{}>}{}<>({<({}[()])>{()(<{}>)}}{}{}<>))<>}<>([[]]){([[]({}()<>)]<>)<>{(({}())<<>(({})<(({}(<()>))<>({}))([(())()()]){<>({}())}{}{<>{}<>({}()){(((({}<(({}<>)<{({}()<([(){}])>)}{}>)<>(({}(<>))<{({}()<([(){}])>)}{}<>>)><>({}))(<(((({}({})[()])[()()]<>({}))<>[({})({}){}]({}<>))<>[(({}<>)<>({}<>)<>)])<>>)))[()](<()>)<<>(({})<({}{}()){({}()<({}<>)<>>)}{}<>(({})<<>(({}<>))>)<>(())>){({}[()()]<(<([({[{}]<(({})()<>[({})]<>)>{()(<{}>)}}{}<(({})<>[()({}<(({}<<>({}<>)<>(({})<>)>)<>[(){}])<>>)]<>)>{()(<{}>)}{}(){[()](<{}>)}<<>{({}<>)<>}{}>)]({}{}))>)<>{({}<>)<>}>)}{}{}<>{}{}{({}<>)<>}{}{}(<>)<>{({}<>)<>}{}{(<{}>)<>{({}<>)<>}<>({}<{}>){({}<>)<>}}{}((({}<({}({})({})<{{}<>{}(<>)}{}(((({}<({}<>)>)<>)))<>>)<>>)<><({}<({}<<>(()())>)>)>)<<>({}<{}{({}<>)([()()()]){((({}()()<>))[()]<(({()(<{}>)}{})<>({}<(({}<<>({}[()()](()[({})({})]({[()](<{}>)}{}<>{}<(({})<>)>)<>))>)<>)>)<>)<>({}<({}<({}<({}<>)>)>)>)>)}{}{}<>}<>{}{}{}{}{}{}{}{}>)>)>)}{}({}<({}<{({}<(({}){({}())}{}{}<(({}){({}())}{}{}<>)>)>)<>}<>{((({}(()()){([{}](<({}(<()>)<>){({}<({}<>)>(())<>)}{}>({})<<>{{}({}<>)<>}{}>))([{}()]{})}{})))<>(({}))<>{<>({}[()])}{}({}<<>{}{}{<>}>)<>{}}<>(({}<>){[()](<{}>)}{})(<>)>)>)<>(<({}<>)>)<>}<>{}({}<(({}){({}())}{}{}){({}<({}<>)>(())<>)}{}{}>)<>{{}({}<>)<>}{}>)<>>)}{}<>([[]{}])}{}(([]){<{}{}>([])}{}<>){({}[()]<{}>)}{}({}<>)

Try it online!

With constant-speed comparisons and pointer dereferencing, this algorithm is O(n3). Unfortunately, Brain-Flak has neither of these, so this program runs in O(n5) time instead. The longest test case takes about 15 minutes.

Simplifying results

To see that my algorithm works, we need to show some results that reduce the search space considerably. These results rely on the fact that the target is an entire language instead of just one specific string.

  • No insertions are needed. Instead, you can just remove the bracket that the inserted character would eventually match.

  • You will never need to remove a bracket, then swap its two neighbors. To see this, assume wlog that the removed bracket is (, so we are transforming a(c to ca in two steps. By changing c and inserting a copy, we can reach ca() in two steps without a swap. (This insertion can then be removed by the above rule.)

  • The same bracket will never need to be swapped twice. This is a standard fact about the Damerau-Levenshtein distance in general.

Another simplifying result that I didn't use, because accounting for them would cost bytes:

  • If two brackets are swapped, and they don't match each other, the eventual match to each of those brackets will never be changed or swapped.

The algorithm

When any string is reduced to a balanced string, one of the following will be true:

  • The first bracket is deleted.
  • The first bracket stays where it is and matches the bracket at some position k (possibly after changing one or both of them).
  • The first bracket is swapped with the second, which in turn matches the bracket at position k.

In the second case, the bracket at position k may have swapped with one of its neighbors. In either of the latter two cases, the string between the (possibly newly) first bracket and the bracket that started in position k must be edited to a balanced string, as does the string consisting of everything after k.

This means that a dynamic programming approach may be used. Since a swapped bracket need not be swapped again, we only need to consider contiguous substrings, as well as subsequences formed by removing the second character and/or the penultimate character from such a substring. Hence, there are only O(n2) subsequences we need to look at. Each of those has O(n) possible ways to match (or delete) the first bracket, so the algorithm would be O(n3) under the conditions above.

The data structure

The right stack includes the brackets from the original string, with two bytes per bracket. The first entry determines the entire bracket, and is chosen such that matched brackets have a difference of exactly 1. The second entry only determines whether it is an opening bracket or a closing bracket: this determines how many changes it takes for two brackets to match each other. No implicit zeros below this are ever made explicit, so that we can use [] to get the total length of this string.

Each substring under consideration is represented by two numbers in the range 0 to 2n: one for the beginning position, and one for the end. The interpretation is as follows:

  • A substring starting at 2k will start at position k (0-indexed), and the second character is not removed.
  • A substring starting at 2k+1 will start at position k, and the second character is removed due to having been swapped left.
  • A substring ending at 2k will end just before position k (i.e., the range is left-inclusive and right-exclusive.)
  • A substring ending at 2k-1 will end just before position k, and the penultimate character is removed due to having been swapped right.

Some ranges (k to k+1, 2k+1 to 2k+1, 2k+1 to 2k+3, and 2k+1 to 2k+5) make no physical sense. Some of those show up as intermediate values anyway, because it's easier than adding additional checks to avoid them.

The left stack stores the number of edits needed to convert each substring into a balanced string. The edit distance for the interval (x,y) is stored at depth x + y(y-1)/2.

During the inner loop, entries are added above the left stack to denote which moves are possible. These entries are 5 bytes long. Counting from the top, the numbers are d+1, y1, x1, y2, x2, where the move costs d edit steps and divides the substring into (x1,y1) and (x2,y2).

The code

Description to come. For now, here's my working copy of the code. Some comments may be inconsistent with terminology.

# Determine bracket type for each byte of input
{({}(())(<>))<>({(()()()())<{({}[()])<>}{}>}{}<>({<({}[()])>{()(<{}>)}}{}{}<>))<>}

# For every possible interval length:
<>([[]]){

  # Compute actual length
  ([[]({}()<>)]<>)

  # Note: switching stacks in this loop costs only 2 bytes.
  # For each starting position:
  # Update/save position and length
  <>{(({}())<<>(({})<

    # Get endpoints
    (({}(<()>))<>({}))

    # If length more than 3:
    ([(())()()]){<>({}())}{}{

      # Clean up length-3 left over from comparison
      <>{}<>

      # Initialize counter at 2
      # This counter will be 1 in the loop if we're using a swap at the beginning, 0 otherwise
      ({}())

      # For each counter value:
      {

        # Decrement counter and put on third stack
        (((({}<

          # Do mod 2 for end position
          (({}<>)<{({}()<([(){}])>)}{}>)<>

          # Do mod 2 for start position
          (({}(<>))<{({}()<([(){}])>)}{}<>>)

        # Subtract 1 from counter if swap already happened
        ><>({}))(<

          # Compute start position of substrings to consider
          (((({}({})[()])[()()]<>({}))

            # Compute start position of matches to consider
            <>[({})({}){}]({}<>))<>

            # Compute end position of matches to consider
            [(({}<>)<>({}<>)<>)]

          # Push total distance of matches
          )

        # Push counter as base cost of moves
        # Also push additional copy to deal with length 5 intervals starting with an even number
        <>>)))[()](<()>)<

          # With match distance on stack
          <>(({})<

            # Move to location in input data
            ({}{}()){({}()<({}<>)<>>)}{}

            # Make copy of opening bracket to match
            <>(({})<<>(({}<>))>)

          # Mark as first comparison (swap allowed)
          <>(())>)

          # For each bracket to match with:
          {({}[()()]<

            (<([(

              # If swap is allowed in this position:
              {

                # Subtract 1 from cost
                [{}]

                # Add 1 back if swap doesn't perfectly match
                <(({})()<>[({})]<>)>{()(<{}>)}

              }{}

              # Shift copy of first bracket over, while computing differences
              <(({})<>[()({}<(({}<<>({}<>)<>(({})<>)>)<>[(){}])<>>)]<>)>

              # Add 1 if not perfectly matched
              {()(<{}>)}{}

              # Add 1 if neither bracket faces the other
              # Keep 0 on stack to return here
              (){[()](<{}>)}

              # Return to start of brackets
              <<>{({}<>)<>}{}>

            # Add to base cost and place under base cost
            )]({}{}))>)

            # Return to spot in brackets
            # Zero here means swap not allowed for next bracket
            <>{({}<>)<>}

          >)}

          # Cleanup and move everything to right stack
          {}{}<>{}{}{({}<>)<>}{}

          # Remove one copy of base cost, and move list of costs to right stack
          {}(<>)<>{({}<>)<>}{}

          # If swap at end of substring, remove second-last match
          {(<{}>)<>{({}<>)<>}<>({}<{}>){({}<>)<>}}{}

          # Put end of substring on third stack
          ((({}<({}({})({})<

            # If swap at beginning of substring, remove first match
            {{}<>{}(<>)}{}

            # Move start of substring to other stack for safekeeping
            (((({}<({}<>)>)<>)))

          # Create "deletion" record, excluding cost
          <>>)<>>)<>

          # Move data to left stack
          <({}<({}<<>

            # Add cost to deletion record
            (()())

          >)>)>)

          # Put start position on third stack under end position
          <<>({}<

            # For each matching bracket cost:
            {}{

              # Move cost to left stack
              ({}<>)

              # Make three configurations
              ([()()()]){

                # Increment counter
                ((({}()()<>))[()]<

                  # Increment cost in first and third configurations
                  (({()(<{}>)}{})<>({}<

                    # Keep last position constant
                    (({}<

                      # Beginning of second interval: 1, 2, 1 past end of first
                      <>({}[()()]

                        # End of first interval: -3, -1, 1 plus current position
                        (()[({})({})]

                          # Move current position in first and third configurations
                          ({[()](<{}>)}{}<>{}<

                            (({})<>)

                          >)

                        <>)

                      )

                    >)<>)

                  >)<>)

                  # Move data back to left stack
                  <>({}<({}<({}<({}<>)>)>)>)

                >)

              }{}

            {}<>}

            # Eliminate last entry
            # NOTE: This could remove the deletion record if no possible matches.  This is no loss (probably).
            <>{}{}{}{}{}{}{}{}

        # Restore loop variables
        >)>)>)

      }{}

      # With current endpoints on third stack:
      ({}<({}<

        # For all entries
        {

          # Compute locations and move to right stack
          ({}<(({}){({}())}{}{}<(({}){({}())}{}{}<>)>)>)<>

        }

        # For all entries (now on right stack):
        <>{

          # Cost of match
          ((({}

            # Do twice:
            (()()){([{}](

              # Add cost of resulting substrings
              <({}(<()>)<>){({}<({}<>)>(())<>)}{}>({})<<>{{}({}<>)<>}{}>

            # Evaluate as sum of two runs
            ))([{}()]{})}{}

          )))

          # Find smaller of cost and current minimum
          <>(({}))<>{<>({}[()])}{}

          # Push new minimum in place of old minimum
          ({}<<>{}{}{<>}>)

          <>{}

        }

        # Subtract 1 if nonzero
        <>(({}<>){[()](<{}>)}{})(<>)

      >)>)

      <>(<({}<>)>)<>

    # Otherwise (length 3 or less), use 1 from earlier as cost.
    # Note that length 0-1 is impossible here.
    }<>{}

    # With cost on third stack:
    ({}<

      # Find slot number to store cost of interval
      (({}){({}())}{}{})

      # Move to slot
      {({}<({}<>)>(())<>)}{}

    # Store new cost
    {}>)

    # Move other slots back where they should be
    <>{{}({}<>)<>}{}

  Restore length/position for next iteration
  >)<>>)}

  # Clear length/position from inner loop
  {}<>([[]{}])

}{}

(([]){<{}{}>([])}{}<>){({}[()]<{}>)}{}({}<>)

Haskell, 797 bytes

import Data.Array;import Data.Function;import Data.List;
e=length;f=fst;o=map;s=listArray;u=minimum;b p=let{m=e p;x=s(1,m)p;
v=s(1,m)(listArray('(','}')[0,0..]:[v!i//[(x!i,i)]|i<-[1..m-1]]);
d q=let{n=e q;y=s(1,n)q;t(a,b)=listArray((a,b),(m,n));
c=t(1,1)[sum[1|x!i/=y!j]|i<-[1..m],j<-[1..n]];
d=t(-1,-1)[if i<0||j<0then m+n else 
if i*j<1then(i+j)else u[1+d!(i-1,j),1+d!(i,j-1),c!(i,j)+d!(i-1,j-1),
let{k=v!i!(y!j)-1;l=w!(i,j-1)-1}in-3+i+j-k-l+d!(k,l)]|i<-[-1..m],j<-[-1..n]];
w=t(1,0)[if j>0&&c!(i,j)>0then w!(i,j-1)else j|i<-[1..m],j<-[0..n]]}in d!(m,n);
a=s(0,div m 2)([(m,"")]:[(concat.take 2.groupBy(on(==)f).sort.o(\q->(d q,q)))(
[b:c++[d]|[b,d]<-words"() <> [] {}",(_,c)<-a!(l-1)]++
concat[[b++d,d++b]|k<-[1..div l 2],(_,b)<-a!k,(_,d)<-a!(l-k)])|l<-[1..div m 2]]);
}in u(o(f.head)(elems a))

Try it online!