Write a Clem interpreter

Python, 1684 1281 characters

Got all the basic golf stuff done. It runs all the example programs and matches the output character-for-character.

import sys,os,copy as C
L=len
S=[]
n=[S]
Q=lambda:S and S.pop()or 0
def P(o):
 if o:n[0].append(o)
def X():x=Q();P(x);P(C.deepcopy(x))
def W():S[-2::]=S[-1:-3:-1]
def R():a,b,c=Q(),Q(),Q();P(a);P(c);P(b)
def A(d):
 a=Q()
 if a and a[0]:a=[1,a[1]+d,lambda:P(a)]
 P(a)
def V():
 a=Q();P(a)
 if a and a[0]-1and L(a[2])>1:r=a[2].pop(0);P(r)
def T():
 b,a=Q(),Q()
 if a!=b:P([0,0,(a[2],[a])[a[0]]+(b[2],[b])[b[0]]])
 else:P(a);P(b)
def r():a=os.read(0,1);F(ord(a)if a else-1)
def q(f):
 a=Q()
 if a and a[0]:os.write(1,(chr(a[1]%256),str(a[1]))[f])
def e(f,x=0):f[2]()if f[0]+f[1]else([e(z)for z in f[2]]if x else P(f))
def w():
 a=Q()
 while a and S and S[-1][0]and S[-1][1]:e(a,1)
def Y():n[:0]=[[]]
def Z():
 x=n.pop(0)
 if x:n[0]+=([[0,0,x]],x)[L(x)+L(n)==2]
D={'%':Q,'#':X,'$':W,'@':R,'+':lambda:A(1),'-':lambda:A(-1),'/':V,'.':T,'<':r,'>':lambda:q(0),'c':lambda:q(1),'w':w,'(':Y,')':Z}
def g(c):D[c]()if L(n)<2or c in'()'else P([0,1,D[c]])
N=['']
def F(x):a=[1,x,lambda:P(a)];a[2]()
def E():
 if'-'==N[0]:g('-')
 elif N[0]:F(int(N[0]))
 N[0]=''
s=j=""
for c in open(sys.argv[1]).read()+' ':
 if j:j=c!="\n"
 elif'"'==c:E();s and map(F,map(ord,s[:0:-1]));s=(c,'')[L(s)>0]
 elif s:s+=c
 elif';'==c:E();j=1
 else:
    if'-'==c:E()
    if c in'-0123456789':N[0]+=c
    else:E();c in D and g(c)

Testing:

Gather clemint.py, clemtest_data.py, clemtest.py, and a compiled clem binary into a directory and run clemtest.py.

Expanation:

The most ungolfed version is this one. Follow along with that one.

S is the main stack. Each item of the stack is a 3-list, one of:

Constant: [1, value, f]
Atomic: [0, 1, f]
Compound: [0, 0, fs]

For the constants, f is a function which pushes the constant onto the stack. For the atmoics, f is a function which executes one of the operations (e.g. -, +). For the compounds, fs is a list of items.

xec executes an item. If it's a constant or an atomic, it just executes the function. If it's a compound, if there has been no recursion yet, it executes each function. So executing (10 20 - 30) will execute each of the functions 10, 20, -, and 30, leaving 10 19 30 on the stack. If there has been recursion, then it just pushes the compound function onto the stack. For example, when executing (10 20 (3 4) 30), the result should be 10 20 (3 4) 30, not 10 20 3 4 30.

Nesting was a bit tricky. What do you do while reading off (1 (2 (3 4)))? The solution is to have a stack of stacks. At each nesting level, a new stack is pushed on the stack of stacks, and all push operations go onto this stack. Further, if there has been nesting, then atomic functions are pushed instead of executed. So if you see 10 20 (- 30) 40, 10 is pushed, then 20, then a new stack is created, - and 30 are pushed onto the new stack, and ) pops off the new stack, turns it into an item, and pushes it onto the stack one level down. endnest() handles ). It was a bit tricky since there's a special case when only one item has been pushed and we're pushing back onto the main stack. That is, (10) should push the constant 10, not a composite with the one constant, because then - and + don't work. I'm not sure whether this is principled but it's the way it works...

My interpreter is a character-by-character processor - it doesn't create tokens - so numbers, strings, and comments were somewhat annoying to deal with. There's a separate stack, N, for a currently-being-processed-number, and anytime a character which isn't a number is processed, I have to call endnum() to see whether I should first complete that number and put it on the stack. Whether we're in a string or a comment is kept track of by boolean variables; when a string is closed it pushes all the innards on the stack. Negative numbers required some special handling as well.

That's about it for the overview. The rest was implementing all the built-ins, and making sure to do deep copies in +, -, and #.


C 840

Thanks to @ceilingcat for finding a much better (and shorter) version

This treats everything as simple strings - all stack items are strings, even constants are strings.

#define Q strcpy
#define F(x)bcopy(b,f,p-b);f[p-b-x]=!Q(r,p);
#define C(x,y)Q(S[s-x],S[s-y]);
#define N[99]
#define A Q(S[s++]
#define D sprintf(S[s++],"%d"
#define G(x)}if(*f==x){
#define H(x)G(x)s--;
#define R return
#define Z(x)T(t,u,v)-1||putchar(x);H(
z=1,*y;char S N N;s;c;T(b,f,r,p)char*b,*f,*r,*p;{strtol(b+=strspn(b," "),&p,0);if(p>b){F(0)R 1;}if(c=*b==40){for(p=++b;c;p++)if(z^=*p==34)c+=*p-41?*p==40:-1;F(1)R-1;}p++;F(0)*r*=!!*b;R 0;}*P(char*p){R*p-34?y=P(p+1),D,*p),y:++p;}E(int*x){char*p,c N,f N,r N,t N,u N,v N;for(Q(c,x);*c;Q(c,p)){Q(t,S[s-1]);if(T(c,f,p=r))A,f);else{{G(64)C(0,1)C(1,2)C(2,3)C(3,0)G(35)A,t);G(36)C(0,2)C(2,1)C(1,0)H(37)H(47)T(t,u,v);*v&&A,v);A,u);H(46)strcat(strcat(S[s-1]," "),t);H(43)D,atoi(t)+1);H(45)D,atoi(t)-1);G(60)D,getchar());H(62)Z(atoi(u))99)Z(*u)119)for(Q(u,t);atoi(S[s-1]);)E(u);G(34)p=P(p);}}}}

Try it online!

A less golfed version of my original (unlike the golfed version this one prints the stack when it ends if it is not empty and takes a -e parameter so you can specify the script on the command line instead of reading from a file):

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#define FIRST_REST(x) memcpy(first, b, p - b); first[p - b - x] = '\0'; strcpy(rest, p);
#define COPY(dest,src) strcpy(stack[size + dest], stack[size + src]);
char stack[9999][9999]; int size = 0;
int token(char *b, char *first, char *rest)
{
    while (*b == 32) b++;
    char *p; int x = strtol(b, &p, 0);
    if (p > b) { FIRST_REST(0) return 1; }
    if (*b == '(') { int c = 1; for (p = ++b; c; ++p) c += (*p == '(') - (*p == ')'); FIRST_REST(1) return -1; }
    p++; FIRST_REST(0) if (!*b) *rest = '\0'; return 0;
}
char *push(char *pointer)
{
    if (*pointer == '\"') return pointer+1;
    char *result = push(pointer+1);
    sprintf(stack[size++], "%d", *pointer);
    return result;
}
void eval(char *x)
{
    char program[9999], first[9999], rest[9999], tos[9999], tmp1[9999], tmp2[9999];
    char *pointer;
    for (strcpy(program, x); *program; strcpy(program, pointer))
    {
        *stack[size] = '\0';
        strcpy(tos, stack[size-1]);
        if (token(program, first, rest))
        {
            pointer = rest;
            strcpy(stack[size++], first);
        }
        else
        {
            pointer = rest;
            if (*first == '@'){
                COPY(0, -1) COPY(-1, -2) COPY(-2, -3) COPY(-3, 0) }
            if (*first == '#')
                strcpy(stack[size++], tos);
            if (*first == '$'){
                COPY(0, -2) COPY(-2, -1) COPY(-1, 0) }
            if (*first == '%')
                size--;
            if (*first == '/'){
                size--; token(tos, tmp1, tmp2); if (*tmp2) strcpy(stack[size++], tmp2); strcpy(stack[size++], tmp1); }
            if (*first == '.'){
                size--; strcat(stack[size - 1], " "); strcat(stack[size - 1], tos); }
            if (*first == '+'){
                size--; sprintf(stack[size++], "%d", atoi(tos) + 1); }
            if (*first == '-'){
                size--; sprintf(stack[size++], "%d", atoi(tos) - 1); }
            if (*first == '<')
                sprintf(stack[size++], "%d", getchar());
            if (*first == '>'){
                size--; if (token(tos, tmp1, tmp2) == 1) putchar(atoi(tmp1)); }
            if (*first == 'c'){
                size--; if (token(tos, tmp1, tmp2) == 1) printf("%s", tmp1); }
            if (*first == 'w'){
                size--; strcpy(tmp1, tos); while (atoi(stack[size - 1])) eval(tmp1); }
            if (*first == '\"')
                pointer=push(pointer);
        }
    }
}
int main(int argc, char **argv)
{
    char program[9999] = "";
    int i = 0, comment = 0, quote = 0, space = 0;
    if (!strcmp(argv[1], "-e"))
        strcpy(program, argv[2]);
    else
    {
        FILE* f = fopen(argv[1], "r");
        for (;;) {
            char ch = fgetc(f);
            if (ch < 0) break;
            if (!quote) {
                if (ch == '\n') comment = 0;
                if (ch == ';') comment = 1;
                if (comment) continue;
                if (ch <= ' ') { ch = ' '; if (space++) continue; }
                else space = 0;
            }
            if (ch == '\"') quote = 1 - quote;
            program[i++] = ch;
        }
        fclose(f);
    }
    eval(program);
    for (int i = 0; i < size; i++) printf("%03d: (%s)\r\n",size-i,stack[i]);
    return 0;
}

Haskell, 931 921 875

this isn't fully golfed yet but it will probably never be. Still, it is already shorter than all other solutions. I will golf this more soon. I don't feel like golfing it any more than this.

probably has a few subtle bugs because I didn't play with the C reference implementation.

this solution uses the type StateT [String] IO () to store a "runnable" clem program. most the program is a parser which parses the "runnable program".

in order to run this use r "<insert clem program here>".

import Text.Parsec
import Control.Monad.State
import Control.Monad.Trans.Class
import Data.Char
'#'%(x:y)=x:x:y
'%'%(x:y)=y
'@'%(x:y:z:w)=y:z:x:w
'$'%(x:y:z)=y:x:z
'/'%((a:b):s)=[a]:b:s
'+'%(a:b)=i a(show.succ)a:b
'.'%(a:b:c)=(a++b):c
_%x=x
b=concat&between(s"(")(s")")(many$many1(noneOf"()")<|>('(':)&((++")")&b))
e=choice[s"w">>c(do p<-t;let d=h>>= \x->if x=="0"then a else u p>>d in d),m&k,s"-">>(m&(' ':)&k<|>c(o(\(a:b)->i a(show.pred)a:b))),s"c">>c(do
 d<-t
 i d(j.putStr.show)a),o&(++)&map(show.ord)&between(s"\"")(s"\"")(many$noneOf"\""),(do
 s"<"
 c$j getChar>>=m.show.ord),(do
 s">"
 c$do
 g<-t
 i g(j.putChar.chr)a),m&b,o&(%)&anyChar]
k=many1 digit
i s f g|(reads s::[(Int,String)])>[]=f$(read s::Int)|0<1=g
t=h>>=(o tail>>).c
c n=return n
a=c()
h=head&get
(&)f=fmap f
m=o.(:)
o=modify
u=(\(Right r)->r).parse(sequence_&many e)""
r=(`runStateT`[]).u
s=string
j=lift