How would I extend the JavaScript language to support a new operator?

As I said in the comments of your question, sweet.js doesn't support infix operators yet. You're free to fork sweet.js and add it yourself, or you're simply SOL.

Honestly, it's not worth it to implement custom infix operators yet. Sweet.js is a well supported tool, and it's the only one I know of that tries to implement macros in JS. Adding custom infix operators with a custom preprocessor is probably not worth the gain you might have.

That said, if you're working on this alone for non-professional work, do whatever you want...

EDIT

sweet.js does now support infix operators.


Yes, it's possible and not even very hard :)


We'll need to discuss a few things:

  1. What are syntax and semantics.
  2. How are programming languages parsed? What is a syntax tree?
  3. Extending the language syntax.
  4. Extending the language semantics.
  5. How do I add an operator to the JavaScript language.

If you're lazy and just want to see it in action - I put the working code on GitHub

1. What is syntax and semantics?

Very generally - a language is composed of two things.

  • Syntax - these are the symbols in the language like unary operators like ++, as well as Expressions like a FunctionExpression that represent an "inline" function. The syntax represents just the symbols used and not their meaning. In short the syntax is just the drawings of letters and symbols - it holds no inherent meaning.

  • Semantics ties meaning to these symbols. Semantics is what says ++ means "increment by one", in fact here is the exact defintion. It ties meaning to our syntax and without it the syntax is just a list of symbols with an order.

2. How are programming languages parsed? What is a syntax tree?

At some point, when something executes your code in JavaScript or any other programming language - it needs to understand that code. A part of this called lexing (or tokenizing, let's not go into subtle differences here) means breaking up code like:

function foo(){ return 5;}

Into its meaningful parts - that is saying that there is a function keyword here, followed by an identifier, an empty arguments list, then a block opening { containing a return keyword with the literal 5, then a semicolon, then an end block }.

This part is entirely in the syntax, all it does is break it up to parts like function,foo,(,),{,return,5,;,} . It still has no understanding of the code.

After that - a Syntax Tree is built. A syntax tree is more aware of the grammar but is still entirely syntactic. For example, a syntax tree would see the tokens of:

function foo(){ return 5;}

And figure out "Hey! There is a function declaration here!".

It's called a tree because it's just that - trees allow nesting.

For example, the code above can produce something like:

                                        Program
                                  FunctionDeclaration (identifier = 'foo')
                                     BlockStatement
                                     ReturnStatement
                                     Literal (5)

This is rather simple, just to show you it isn't always so linear, let's check 5 +5:

                                        Program
                                  ExpressionStatement
                               BinaryExpression (operator +)
                            Literal (5)       Literal(5)   // notice the split her

Such splits can occur.

Basically, a syntax tree allows us to express the syntax.

This is where x ∘ y fails - it sees and doesn't understand the syntax.

3. Extending the language syntax.

This just requires a project that parses the syntax. What we'll do here is read the syntax of "our" language which is not the same as JavaScript (and does not comply to the specification) and replace our operator with something the JavaScript syntax is OK with.

What we'll be making is not JavaScript. It does not follow the JavaScript specification and a standards complaint JS parser will throw an exception on it.

4. Extending the language semantics

This we do all the time anyway :) All we'll do here is just define a function to call when the operator is called.

5. How do I add an operator to the JavaScript language.

Let me just start by saying after this prefix that we'll not be adding an operator to JS here, rather - we're defining our own language - let's call it "CakeLanguage" or something and add the operator it it. This is because is not a part of the JS grammar and the JS grammar does not allow arbitrary operators like some other languages.

We'll use two open source projects for this:

  • esprima which takes JS code and generates the syntax tree for it.
  • escodegen which does the other direction, generating JS code from the syntax tree esprima spits.

It you paid close attention you'd know we can't use esprima directly since we'll be giving it grammar it does not understand.

We'll add a # operator that does x # y === 2x + y for the fun. We'll give it the precedence of multiplicity (because operators have operator precedence).

So, after you get your copy of Esprima.js - we'll need to change the following:

To FnExprTokens - that is expressions we'll need to add # so it'd recognize it. Afterwards, it'd look as such:

FnExprTokens = ['(', '{', '[', 'in', 'typeof', 'instanceof', 'new',
                    'return', 'case', 'delete', 'throw', 'void',
                    // assignment operators
                    '=', '+=', '-=', '*=', '/=', '%=', '<<=', '>>=', '>>>=',
                    '&=', '|=', '^=', ',',
                    // binary/unary operators
                    '+', '-', '*', '/', '%','#', '++', '--', '<<', '>>', '>>>', '&',
                    '|', '^', '!', '~', '&&', '||', '?', ':', '===', '==', '>=',
                    '<=', '<', '>', '!=', '!=='];

To scanPunctuator we'll add it and its char code as a possible case: case 0x23: // #

And then to the test so it looks like:

 if ('<>=!+-*#%&|^/'.indexOf(ch1) >= 0) {

Instead of:

    if ('<>=!+-*%&|^/'.indexOf(ch1) >= 0) {

And then to binaryPrecedence let's give it the same precedence as multiplicity:

case '*':
case '/':
case '#': // put it elsewhere if you want to give it another precedence
case '%':
   prec = 11;
   break;

That's it! We've just extended our language syntax to support the # operator.

We're not done yet, we need to convert it back to JS.

Let's first define a short visitor function for our tree that recursively visits all its node.

function visitor(tree,visit){
    for(var i in tree){
        visit(tree[i]);
        if(typeof tree[i] === "object" && tree[i] !== null){
            visitor(tree[i],visit);
        }
    }
}

This just goes through the Esprima generated tree and visits it. We pass it a function and it runs that on every node.

Now, let's treat our special new operator:

visitor(syntax,function(el){ // for every node in the syntax
    if(el.type === "BinaryExpression"){ // if it's a binary expression

        if(el.operator === "#"){ // with the operator #
        el.type = "CallExpression"; // it is now a call expression
        el.callee = {name:"operator_sharp",type:"Identifier"}; // for the function operator_#
        el.arguments = [el.left, el.right]; // with the left and right side as arguments
        delete el.operator; // remove BinaryExpression properties
        delete el.left;
        delete el.right;
        }
    }
});

So in short:

var syntax = esprima.parse("5 # 5");

visitor(syntax,function(el){ // for every node in the syntax
    if(el.type === "BinaryExpression"){ // if it's a binary expression

        if(el.operator === "#"){ // with the operator #
        el.type = "CallExpression"; // it is now a call expression
        el.callee = {name:"operator_sharp",type:"Identifier"}; // for the function operator_#
        el.arguments = [el.left, el.right]; // with the left and right side as arguments
        delete el.operator; // remove BinaryExpression properties
        delete el.left;
        delete el.right;
        }
    }
});

var asJS = escodegen.generate(syntax); // produces operator_sharp(5,5);

The last thing we need to do is define the function itself:

function operator_sharp(x,y){
    return 2*x + y;
}

And include that above our code.

That's all there is to it! If you read so far - you deserve a cookie :)

Here is the code on GitHub so you can play with it.