Formatting a Lisp-like Syntax

Pyth, 24 20 19 18 bytes

FN.z+*ZC9N~Z-1/N\)

Increments a counter for every line, counts total number of closing parentheses encountered so far, and subtracts it from the counter. Then we indent by counter tabs.


Common Lisp - 486 414 bytes (Rube Goldberg version)

(labels((p(x d)(or(when(listp x)(#2=princ #\()(p(car x)d)(incf d)(dolist(a(cdr x))(format t"~%~v{   ~}"d'(t))(p a d))(#2# #\)))(#2# x))))(let((i(make-string-input-stream(with-output-to-string(o)(#1=ignore-errors(do(b c)(())(if(member(setq c(read-char))'(#\( #\) #\  #\tab #\newline):test'char=)(progn(when b(prin1(coerce(reverse b)'string)o))(#2# c o)(setq b()))(push c b))))))))(#1#(do()(())(p(read i)0)(terpri)))))

Approach

Instead of doing like everybody else and count parentheses by hand, let's invoke the Lisp reader and do it The Right Way :-)

  • Read from input stream and write to a temporary output stream.
  • While doing so, aggregate characters different from (, ) or whitespace as strings.
  • The intermediate output is used to build a string, which contains syntactically well-formed Common-Lisp forms: nested lists of strings.
  • Using that string as an input stream, call the standard read function to build actual lists.
  • Call p on each of those lists, which recursively write them to the standard output with the requested format. In particular, strings are printed unquoted.

As a consequence of this approach:

  1. There are less restrictions on the input format: you can read arbitrarly formatted inputs, not just "one function per line" (ugh).
  2. Also, if the input is not well-formed, an error will be signaled.
  3. Finally, the pretty-printing function is well decoupled from parsing: you can easily switch to another way of pretty-printing S-expressions (and you should, if you value your vertical space).

Example

Reading from a file, using this wrapper:

(with-open-file (*standard-input* #P"path/to/example/file")
    ...)

Here is the result:

(!@#$%^&*
    (asdfghjklm
        (this_string_is_particularly_long
            (...))
        (123456789)))
(THIS_IS_TOP_LEVEL_AGAIN
    (HERE'S_AN_ARGUMENT))
(-:0
    (*:0
        (%:0
            (Arg:6)
            (Write:0
                (Read:0
                    (Arg:30))
                (Write:0
                    (Const:-6)
                    (Arg:10))))
        (%:0
            (Const:9)
            (/:0
                (Const:-13)
                (%:0
                    (Arg:14)
                    (Arg:0)))))
    (WriteArg:22
        (-:0
            (Const:45)
            (?:0
                (Arg:3)
                (Arg:22)
                (Arg:0)))))

(it seems that tabs are converted to spaces here)

Pretty-printed (golfed version)

Contrary to the safer original version we expect input to be valid.

(labels ((p (x d)
           (or
            (when (listp x)
              (princ #\()
              (p (car x) d)
              (incf d)
              (dolist (a (cdr x)) (format t "~%~v{  ~}" d '(t)) (p a d))
              (princ #\)))
            (princ x))))
  (let ((i
         (make-string-input-stream
          (with-output-to-string (o)
            (ignore-errors
             (do (b
                  c)
                 (nil)
               (if (member (setq c (read-char)) '(#\( #\) #\  #\tab #\newline)
                           :test 'char=)
                   (progn
                    (when b (prin1 (coerce (reverse b) 'string) o))
                    (princ c o)
                    (setq b nil))
                   (push c b))))))))
    (ignore-errors (do () (nil) (p (read i) 0) (terpri)))))

Retina, 89 83 bytes

s`.+
$0<tab>$0
s`(?<=<tab>.*).
<tab>
+ms`^((\()|(?<-2>\))|[^)])+^(?=\(.*^((?<-2><tab>)+))
$0$3
<tab>+$
<empty>

Where <tab> stands for an actual tab character (0x09) and <empty> stands for an empty line. After making those replacements, you can run the above code with the -s flag. However, I'm not counting that flag, because you could also just put each line in its own source file, in which case the 7 newlines would be replaced by 7 penalty bytes for the additional source files.

This is a full program, taking input on STDIN and printing the result to STDOUT.

Explanation

Every pair of lines defines a regex substitution. The basic idea is to make use of .NET's balancing groups to count the current depth up to a given (, and then insert that many tabs before that (.

s`.+
$0<tab>$0

First, we prepare the input. We can't really write back a conditional number of tabs, if we can't find them somewhere in the input string to capture them. So we start by duplicating the entire input, separated by a tab. Note that the s` just activates the single-line (or "dot-all") modifier, which ensures that the . also matches newlines.

s`(?<=<tab>.*).
<tab>

Now we turn every character after that tab into a tab as well. This gives us a sufficient amount of tabs at the end of the string, without modifying the original string so far.

+ms`^((\()|(?<-2>\))|[^)])+^(?=\(.*^((?<-2><tab>)+))
$0$3

This is the meat of the solution. The m and s activate multi-line mode (so that ^ matches the beginnings of lines) and single-line mode. The + tells Retina to keep repeating this substitution until the output stops changing (in this case, that means until the pattern no longer matches the string).

The pattern itself matches a prefix of the input up to an unprocessed ( (that is, a ( that doesn't have any tabs before it, but should). At the same time it determines the depth of the prefix with balancing groups, such that the height of stack 2 will correspond to the current depth, and therefore to number of tabs we need to append. That is this part:

((\()|(?<-2>\))|[^)])+

It either matches a (, pushing it onto the 2 stack, or it matches a ), popping the last capturing from the 2 stack, or it matches something else and leaves the stack untouched. Since the parentheses are guaranteed to be balanced we don't need to worry about trying to pop from an empty stack.

After we've gone through the string like this and found an unprocessed ( to stop at, the lookahead then skips ahead to the end of the string, and captures tabs into group 3 while popping from the 2 stack until its empty:

(?=\(.*^((?<-2><tab>)+))

By using a + in there, we ensure that the pattern only matches anything if at least one tab should be inserted into the match - this avoids an infinite loop when there are multiple root-level functions.

<tab>+$
<empty>

Lastly, we just get rid of those helper tabs at the end of the string to clean up the result.