Optimizing SKI compiler

Python 2, 674

exec u"""import sys
$ V#):%=V.c;V.c+=1
 c=97;p!,v,t:[s,t.u({})][v==s];r!:s;u!,d:d.get(s,s);s!:chr(%)
 def m(s,x):%=min(%,x);-(%==x)+x
$ A#,*x):%,&=x
 C='()';p!,x,y:s.__$__(%.p/,&.p/);m!,x:&.m(%.m(x));u!,d:A(%.u(d),&.u(d));s!:%.s()+s.C[0]+&.s()+s.C[1:]
 def r(s):x=%.r();y=&.r();-x.b.p(x.a,y).r()if'L'in`x`else s.__$__/
$ L(A):C='.';u!,d:L(d.setdefault(%,V()),&.u(d))
x=V();y=V();z=V()
I=L(x,x)
K=L(y,L/)
S=L(x,L(z,L(y,A(A/,A(z,y)))))
def E():
 t=I
 while 1:
    q=sys.stdin.read(1)
    if q in')\\n':-t
    t=A(t,eval(max(q,'E()')).u({}))
t=E().r()
t.m(97)
print t.s()""".translate({33:u'=lambda s',35:u':\n def __init__(s',36:u'class',37:u's.a',38:u's.b',45:u'return ',47:u'(x,y)'})

Note: after while 1:, 3 lines are indented with a tab character.

This is basically the code behind http://ski.aditsu.net/ , translated to python, greatly simplified and heavily golfed.

Reference: (this is probably less useful now that the code is compressed)

V = variable term
A = application term
L = lambda term
c = variable counter
p = replace variable with term
r = reduce
m = final variable renumbering
u = internal variable renumbering (for duplicated terms)
s = string conversion
(parameter s = self)
C = separator character(s) for string conversion
I,K,S: combinators
E = parse

Examples:

python ski.py <<< "KSK"
a.b.c.a(c)(b(c))
python ski.py <<< "SII"        
a.a(a)
python ski.py <<< "SS(SS)(SS)"
a.b.a(b)(c.b(c)(a(b)(c)))(a(d.a(d)(e.d(e)(a(d)(e))))(b))
python ski.py <<< "S(K(SI))K" 
a.b.b(a)
python ski.py <<< "S(S(KS)K)I"                   
a.b.a(a(b))
python ski.py <<< "S(S(KS)K)(S(S(KS)K)I)"        
a.b.a(a(a(b)))
python ski.py <<< "K(K(K(KK)))"
a.b.c.d.e.f.e
python ski.py <<< "SII(SII)"
[...]
RuntimeError: maximum recursion depth exceeded

(this ↑ is expected because SII(SII) is irreducible)

Thanks Mauris and Sp3000 for helping to kill a bunch of bytes :)


Ruby, 323 bytes

I can't believe this piece of crap works at all:

h={};f=96;z=gets.chop
{?S=>'s0.t0.u0.s0u0(t0u0)',?K=>'k0.l0.k0',?I=>'i0.i0'}.each{|k,v|z.gsub!k,?(+v+?)}
loop{z=~/\((?<V>\w1*0)\.(?<A>(?<B>\w1*0|[^()]|\(\g<B>+\))+)\)(?<X>\g<B>)/
s=$`;t=$';abort z.gsub(/\w1*0/){|x|h[x]=h[x]||(f+=1).chr}if !t
z=$`+?(+$~[?A].gsub($~[?V],$~[?X].gsub(/\w1*0/){|a|s[a]?a:a.gsub(?0,'10')})+?)+t}

Using regex substitution to perform β-reduction on raw strings is some Tony-the-Pony stuff. Nonetheless, its output looks correct at least for easy testcases:

$ echo 'I' | ruby ski.rb
(a.a)
$ echo 'SKK' | ruby ski.rb
(a.(a))
$ echo 'KSK' | ruby ski.rb
((a.b.c.ac(bc)))
$ echo 'SII' | ruby ski.rb
(a.(a)((a)))

Here's it handling K(K(K(KK))) with some debug output, which takes about 7 seconds on my laptop, because regular expression recursion is slow. You can see its α-conversion in action!

$ echo 'K(K(K(KK)))' | ruby ski.rb
"(l0.((k10.l10.k10)((k10.l10.k10)((k10.l10.k10)(k10.l10.k10)))))"
"(l0.((l10.((k110.l110.k110)((k110.l110.k110)(k110.l110.k110))))))"
"(l0.((l10.((l110.((k1110.l1110.k1110)(k1110.l1110.k1110)))))))"
"(l0.((l10.((l110.((l1110.(k11110.l11110.k11110))))))))"
(a.((b.((c.((d.(e.f.e))))))))

Common Lisp, 560 bytes

"Finally, I found a use for PROGV."

(macrolet((w(S Z G #1=&optional(J Z))`(if(symbolp,S),Z(destructuring-bind(a b #1#c),S(if(eq a'L),G,J)))))(labels((r(S #1#(N 97))(w S(symbol-value s)(let((v(make-symbol(coerce`(,(code-char N))'string))))(progv`(,b,v)`(,v,v)`(L,v,(r c(1+ n)))))(let((F(r a N))(U(r b N)))(w F`(,F,U)(progv`(,b)`(,U)(r c N))))))(p()(do((c()(read-char()()#\)))q u)((eql c #\))u)(setf q(case c(#\S'(L x(L y(L z((x z)(y z))))))(#\K'(L x(L u x)))(#\I'(L a a))(#\((p)))u(if u`(,u,q)q))))(o(S)(w S(symbol-name S)(#2=format()"~A.~A"b(o c))(#2#()"~A(~A)"(o a)(o b)))))(lambda()(o(r(p))))))

Ungolfed

;; Bind S, K and I symbols to their lambda-calculus equivalent.
;;
;; L means lambda, and thus:
;;
;; -  (L x S) is variable binding, i.e. "x.S"
;; -  (F x)   is function application

(define-symbol-macro S '(L x (L y (L z ((x z) (y z))))))
(define-symbol-macro K '(L x (L u x)))
(define-symbol-macro I '(L x x))

;; helper macro: used twice in R and once in O

(defmacro w (S sf lf &optional(af sf))
  `(if (symbolp ,S) ,sf
       (destructuring-bind(a b &optional c) ,S
         (if (eq a 'L)
             ,lf
             ,af))))

;; R : beta-reduction

(defun r (S &optional (N 97))
  (w S
      (symbol-value s)
      (let ((v(make-symbol(make-string 1 :initial-element(code-char N)))))
        (progv`(,b,v)`(,v,v)
              `(L ,v ,(r c (1+ n)))))
      (let ((F (r a N))
            (U (r b N)))
        (w F`(,F,U)(progv`(,b)`(,U)(r c N))))))

;; P : parse from stream to lambda tree

(defun p (&optional (stream *standard-output*))
  (loop for c = (read-char stream nil #\))
        until (eql c #\))
        for q = (case c (#\S S) (#\K K) (#\I I) (#\( (p stream)))
        for u = q then `(,u ,q)
        finally (return u)))

;; O : output lambda forms as strings

(defun o (S)
  (w S
      (princ-to-string S)
      (format nil "~A.~A" b (o c))
      (format nil (w b "(~A~A)" "(~A(~A))") (o a) (o b))))

Beta-reduction

Variables are dynamically bound during reduction with PROGV to new Common Lisp symbols, using MAKE-SYMBOL. This allows to nicely avoid naming collisions (e.g. undesired shadowing of bound variables). I could have used GENSYM, but we want to have user-friendly names for symbols. That is why symbols are named with letters from a to z (as allowed by the question). N represents the character code of the next available letter in current scope and starts with 97, a.k.a. a.

Here is a more readable version of R (without the W macro):

(defun beta-reduce (S &optional (N 97))
  (if (symbolp s)
      (symbol-value s)
      (if (eq (car s) 'L)
          ;; lambda
          (let ((v (make-symbol (make-string 1 :initial-element (code-char N)))))
            (progv (list (second s) v)(list v v)
              `(L ,v ,(beta-reduce (third s) (1+ n)))))
          (let ((fn (beta-reduce (first s) N))
                (arg (beta-reduce (second s) N)))
            (if (and(consp fn)(eq'L(car fn)))
                (progv (list (second fn)) (list arg)
                  (beta-reduce (third fn) N))
                `(,fn ,arg))))))

Intermediate results

Parse from string:

CL-USER> (p (make-string-input-stream "K(K(K(KK)))"))
((L X (L U X)) ((L X (L U X)) ((L X (L U X)) ((L X (L U X)) (L X (L U X))))))

Reduce:

CL-USER> (r *)
(L #:|a| (L #:|a| (L #:|a| (L #:|a| (L #:|a| (L #:|b| #:|a|))))))

(See trace of execution)

Pretty-print:

CL-USER> (o *)
"a.a.a.a.a.b.a"

Tests

I reuse the same test suite as the Python answer:

        Input                    Output              Python output (for comparison)

   1.   KSK                      a.b.c.a(c)(b(c))    a.b.c.a(c)(b(c))              
   2.   SII                      a.a(a)              a.a(a)                        
   3.   S(K(SI))K                a.b.b(a)            a.b.b(a)                      
   4.   S(S(KS)K)I               a.b.a(a(b))         a.b.a(a(b))                   
   5.   S(S(KS)K)(S(S(KS)K)I)    a.b.a(a(a(b)))      a.b.a(a(a(b)))                
   6.   K(K(K(KK)))              a.a.a.a.a.b.a       a.b.c.d.e.f.e 
   7.   SII(SII)                 ERROR               ERROR

The 8th test example is too large for the table above:

8.      SS(SS)(SS)
CL      a.b.a(b)(c.b(c)(a(b)(c)))(a(b.a(b)(c.b(c)(a(b)(c))))(b))      
Python  a.b.a(b)(c.b(c)(a(b)(c)))(a(d.a(d)(e.d(e)(a(d)(e))))(b))
  • EDIT I updated my answer in order to have the same grouping behavior as in aditsu's answer, because it costs less bytes to write.
  • The remaining difference can be seen for tests 6 and 8. The result a.a.a.a.a.b.a is correct and does not use as much letters as the Python answer, where bindings to a, b, c and d are not referenced.

Performance

Looping over the 7 passing tests above and collecting the results is immediate (SBCL output):

Evaluation took:
  0.000 seconds of real time
  0.000000 seconds of total run time (0.000000 user, 0.000000 system)
  100.00% CPU
  310,837 processor cycles
  129,792 bytes consed

Doing the same test a hundred of times lead to ... "Thread local storage exhausted" on SBCL, due to a known limitation regarding special variables. With CCL, calling the same test suite 10000 times takes 3.33 seconds.