Interpret /// (pronounced 'slashes')

J - 181 190 170 char

This was a nightmare. I rewrote it from scratch, twice, because it just kept bugging me. This is a function taking a single string argument, outputting to STDOUT.

(0&$`((2{.{:@>&.>)((j{.]),[email protected]=`[email protected]~:~/@[,]}.~#@p+j=.0{p [email protected]])i 5;@}.&,'/';"0;&.>)@.(2<#)@}.[4:1!:2~{:@>@p=.>@{[email protected][)@((0;(0,:~1 0,.2);'\';&<1 0)<;[email protected];:'/'&,)i=. ::](^:_)

To explain, I will break it up into subexpressions.

i =. ::](^:_))
parse =: ((0;(0,:~1 0,.2);'\';&<1 0)<;[email protected];:'/'&,)
print =: 4:1!:2~{:@>@p=.>@{[email protected][
eval  =: 0&$`((2{.{:@>&.>)sub 5;@}.&,'/';"0;&.>)@.(2<#)@}.
sub   =: ((j{.]),[email protected]=`[email protected]~:~/@[,]}.~#@p+j=.0{p [email protected]])i

interp =: (eval [ print) @ parse i
  • i (short for iterate) is an adverb. It takes a verb argument on the left and returns a verb (f)i, which when applied to an argument, applies f repeatedly to the argument until one of two things happens: it finds a fixed point (y = f y), or it throws an error. The fixed-point behaviour is inherent to ^:_, and ::] does the error handling.

  • parse tokenizes the input into what I call half-parsed form, and then cuts it up at the unescaped '/'. It binds escaping backslashes to their characters, but doesn't get rid of the backslashes—so we can either revert it or finish it depending on which we want.

    The bulk of the interesting work occurs in ;:. This is a sequential-machine interpreter primitive, taking a description of the machine ((0;(0,:~1 0,.2);'\';&<1 0)) on the left and something to parse on the right. This does the tokenizing. I will note that this specific machine actually treats the first character unspecial, even if it's a \ and should bind. I do this for a few reasons: (1) the state table is simpler, so it can be golfed further; (2) we can easily just add a dummy character to the front to dodge the problem; and (3) that dummy-character gets half-parsed at no extra cost, so I can use it to set up for the cutting phase, next.

    We also use <;._1 to cut the tokenized result on unescaped / (which is what I choose to be the first char). This is handy for pulling out the output, pattern, and replacement from out/patt/repl/rest all in one step, but unfortunately also cuts up the rest of the program, where we need those / to stay untouched. I splice these back in during eval, because making <;._1 leave them alone ends up costing a lot more.

  • The fork (eval [ print) executes print on the result from parse for its side-effects, and then runs eval. print is a simple verb that opens up the first box (the one we know for sure is output), finishes parsing it, and sends it to STDOUT. However, we also take the chance to define a utility verb p.

    p is defined as >@{[email protected][, so it takes its left arg (acts like the identity if given only one arg), takes the first item of that (identity when given a scalar), and unboxes it (identity if already unboxed). This will come in very handy in sub.

  • eval evaluates the remainder of the processed program. If we don't have a full pattern or a full replacement, eval throws it out and just returns an empty list, which terminates evaluation by making ;: (from parse) error out on the next iteration. Else, eval fully parses the pattern and replacement, corrects the remainder of the source, and then passes both to sub. By explosion:

                                                  @}.  NB. throw out printed part
                                           @.(2<#)     NB. if we have a pattern and repl:
          2{.                                          NB.  take the first two cuts:
                 &.>                                   NB.   in each cut:
             {:@>                                      NB.    drop escaping \ from chars
         (          )                                  NB.  (these are pattern and repl)
                                       &.>             NB.  in each cut:
                                      ;                NB.   revert to source form
                                '/';"0                 NB.  attach a / to each cut
                              &,                       NB.  linearize (/ before each cut)
                         5  }.                         NB.  drop '/pattern/repl/'
                          ;@                           NB.  splice together
        (            sub                  )            NB.  feed these into sub
       `                                               NB. else:
    0&$                                                NB.  truncate to an empty list
  • sub is where one (possibly infinite) round of substitutions happens. Because of the way we set up eval, the source is the right argument, and the pattern and replacement are bundled together in the left. Since the arguments are ordered like this and we know the pattern and replacement don't change within a round of substitutions, we can use another feature of i—the fact that it modifies only the right argument and keeps passing in the same left—to delegate to J the need to worry about keeping track of the state.

    There are two spots of trouble, though. The first is that J verbs can have at most two arguments, so we don't have an easy way to access any that are bundled together, like pattern and replacement, here. Through clever use of the p utility we defined, this isn't that big of a problem. In fact, we can access the pattern in one character, just by using p, because of its >@{[email protected][ definition: the Unbox of the First item of the Left arg. Getting the replacement is tricker, but the shortest way would be p&|., 2 chars shorter than manually getting it out.

    The second problem is that i exits on fixed points instead of looping forever, and if the pattern and replacement are equal and you make a substitution, that looks like a fixed point to J. We handle this by entering an infinite loop of negating 1 over and over if we detect they are equal: this is the [email protected]=`[email protected]~:~/ portion, replacing p&|..

                                        p    E.]    NB. string search, patt in src
                                          [email protected]       NB. indices of matches
                                      0{            NB. take the first (error if none)
                                   j=.              NB. assign to j for later use
                               #@p+                 NB. add length of pattern
                           ]}.~                     NB. drop that many chars from src
                       /@[                          NB. between patt and repl:
                      ~                             NB.  patt as right arg, repl as left
                  @.~:                              NB.  if equal:
            [email protected]=                                    NB.   loop forever
                `p                                  NB.  else: return repl
     (j{.])                                         NB. first j chars of src
           ,              ,                         NB. append all together
    (                                           )i  NB. iterate
  • This cycle repeats due to the use of i, until something outside of sub errors out. As far as I'm aware, this can only happen when we are out of characters, of when we throw out an incomplete set of pattern-and-replacement.

Fun facts about this golf:

  • For once, using ;: is shorter than manually iterating through the string.
  • 0{ should have a chance to error out before sub goes into an infinite loop, so this it should work fine if the pattern matches the replacement but never shows up in the remainder of the source. However, this may or may not be unspecified behaviour, since I can't find a citation either way in the docs. Whoopsie.
  • Keyboard interrupts are processed as spontaneous errors inside running functions. However, due to the nature of i, those errors get trapped too. Depending on when you hit Ctrl+C, you might:
    • Exit out of the negate-forever loop, error out of the sub loop by trying to concatenate a number to a string, and then go on interpreting /// as if you finished substituting a string with itself an infinite number of times.
    • Leave sub halfway through and go on interpreting a half-subbed /// expression.
    • Break out of the interpreter and return an unevaluated /// program to the REPL (not STDOUT, though).

Example usage:

   f=:(0&$`((2{.{:@>&.>)((j{.]),[email protected]=`[email protected]~:~/@[,]}.~#@p+j=.0{p [email protected]])i 5;@}.&,'/';"0;&.>)@.(2<#)@}.[4:1!:2~{:@>@p=.>@{[email protected][)@((0;(0,:~1 0,.2);'\';&<1 0)<;[email protected];:'/'&,)i=. ::](^:_)
   f 'no'
   f '/ world! world!/Hello,/ world! world! world!'
Hello, world!
   f '/foo/Hello, world!//B\/\\R/foo/B/\R'
Hello, world!
   f '//'  NB. empty string

   f '/\\/good/\/'

APL (133)

{T←''∘{(0=≢⍵)∨'/'=⊃⍵:(⊂⍺),⊂⍵⋄(⍺,N⌷⍵)∇⍵↓⍨N←1+'\'=⊃⍵}⋄⍞N←T⍵⋄p N←T 1↓N⋄r N←T 1↓N⋄''≡N:→⋄∇{⍵≡p:∇r⋄∨/Z←p⍷⍵:∇(r,⍵↓⍨N+≢p),⍨⍵↑⍨N←1-⍨Z⍳1⋄⍵}1↓N}

This is a function that takes the /// code as its right argument.

Ungolfed, with explanation:

   ⍝ a function to split the input string into 'current' and 'next' parts,
   ⍝ and unescape the 'current' bit
       ⍝ if the string is empty, or '/' is reached,
       ⍝ return both strings (⍺=accumulator ⍵=unprocessed)
       ⍝ otherwise, add current character to accumulator,
       ⍝ skipping over '\'s. (so if '\/' is reached, it skips '\',
       ⍝ adds '/' and then processes the character *after* that.)

   ⍞   next ← split ⍵      ⍝ output stage
   pat next ← split 1↓next ⍝ pattern stage, and eat the '/'
   rpl next ← split 1↓next ⍝ replacement stage, and eat the '/'

   ⍝ if there are no characters left, halt.

   ⍝ otherwise, replace and continue.
   ∇{  ⍝ if the input string equals the pattern, return the replacement and loop

       ⍝ otherwise, find occurences, if there are, replace the first and loop
       ∨/occ←pat⍷⍵:∇(rpl, (idx+≢pat)↓⍵),⍨ (idx←(occ⍳1)-1)↑⍵

       ⍝ if no occurences, return string


Perl - 190


Reads /// program from stdin until EOF.