Is it important that the front-end has a built-in parser?

Is it strictly necessary? In most cases, no, but doing so gives us many advantages. Most of those advantages might be worked around with various heuristics. But most importantly, it would have been very difficult to have a robust typesetting system.

First, the FrontEnd was always going to need some intermediate representation which is not simply textual. Because two-dimensional typesetting has no textual representation. And at the time we implemented our typesetting system, there was only one significant language which represented typesetting...TeX. But TeX represents the typesetting with semantic ambiguities which make it difficult to understand how to turn it into a Wolfram Language expression. Does sin(x) mean Sin[x], or does it mean Times[sin,x]? The front end has TraditionalForm, which uses heuristics to attempt to navigate these ambiguities, but I think most people would agree that it's a significant step backward from a programming language which can be represented with precise, non-heuristic semantics. So, StandardForm was born out of necessity...we want typesetting and precise semantics.

Let's now add an additional requirement to the typesetting system. It's a pretty small one. We would like the keystroke for "create fraction" to not simply allow for the creation of an empty fraction, but also allow us to type in a fraction inline (so that one can type, e.g., 1, Ctrl+/, 2). Consider it a nice usability tweak that maybe we could implement over the weekend.

If I type Ctrl+/ to input a fraction, how much should I pull from the left of the cursor to create that fraction? Obviously, if there's a number of symbol, we want that, and that's it. Well, maybe we might want some prefix operators. Prefix minus, maybe?

- a Ctrl+/ b
c - a Ctrl+/ b
c + - a Ctrl+/ b
c - - a Ctrl+/ b

If we want our prefix minus pulled in, then the second case looks pretty different than the others. In order to solve this problem, we need to be able to distinguish a prefix minus from an infix minus. But surely, we want the same thing to work for a prefix plus, as well. But maybe the whole prefix minus was just a bad convention, and we can ignore the problem.

But...we want \[PartialD], right? People want to use our system for calculus, so that seems a pretty important operator to automatically go into the fraction. Okay, so let's consider the possibility of maybe creating a set of prefix operators that we apply special behavior to. The weekend just got a little longer.

Oh, but postfix operators surely should work, too. You want all of f' to go into a fraction, right? Okay, so another rule...all postfix operators go into the fraction. Fine...we're in great shape now.

(-Sin[Cos[x]] Ctrl+/ 2 Ctrl+space + 1)

Oops. We just blew out our weekend. By the time you apply all these heuristics we've been building up, you've just gone and built yourself a parser...but a really bad one because it was created from heuristics rather than a proper sense of operators and precedence.

Okay, so let's assume that we've built a parser. What else can we do with it?

  • Syntax coloring. That seems useful. Once you have a parser, it becomes really trivial to look for mismatched operators, or to determine whether a given swath of text represents a head or a body of a given expression. Doing local variable highlighting would be really hard without a parser living somewhere.

  • The kernel parser can only represent complete expressions. The expression a+ is going to throw a syntax error, and there's simply no way around that. We could try to put it in a string and hide the fact that it's a string-ish thing, but that's going to be tricky. But the front end parser deals with incomplete expressions all the time. Every time you type something, you're constantly creating incomplete expressions, so it really is a requirement. Now we have a way to represent those expressions that we can attach new kernel parsing rules to.

  • Structured selection. As you already conceded, so I won't elaborate on this one.

  • You can do automated line-breaking which is guided by the parsed structure. Instead of merely figuring out rules based on characters which allow linebreaks, you can now add rules which depend upon things like expression depth.

  • We can now create typesetting constructs with metadata so that we can provide precise semantics to the kernel while still maintaining visual fidelity to some typesetting standard. Want to say that Jn(z) maps to BesselJ? No problem. Just embed a bit of metadata into that typesetting construct (which might even include other visual distinctions so as to remove visual ambiguities), and it's all done.

Among other things.


Not really an answer and I may have to delete it, but...I had something of a different reaction to this business of the front end having its own parser, circa 2004. For motivation I recommend the original.


I knew a front end
that swallowed a kernel.
I don't know why
it swallowed a kernel.
They taste infernal.

I knew a front end
that swallowed a debugger.
(Most of us
would think that meshugger.)
It swallowed a debugger
to fix a kernel.
But I don't know why
it swallowed a kernel.
They taste infernal.

I knew a front end
that swallowed text highlighting.
Now you, or I,
would find that quite frightening.
It swallowed the highlighter
to highlight its debugger.
It swallowed the debugger
to scare bugs from the kernel.
But noone knows why
it swallowed that kernel.
They taste infernal.

I knew a front end
that swallowed interactive graphics
(Prematurely, they say,
cause the design was half-buttoxed.)
It swallowed the graphics
to illuminate highlighting.
It swallowed a highlighter
to show off debugging
It swallowed debugging
for a kernel kept chugging.
But I don't know why
it swallowed that kernel.
They taste infernal.

I knew a front end
that swallowed its code source.
It crashed, of course!


The result of the front-end's parsing is definitely used, but it does not need to be complete, it seems. This can be seen by constructing Cells/Boxes manually.

Try

RawBoxes@RowBox[{"1", "+", RowBox[{"2", "*", "3"}]}]
RawBoxes@RowBox[{RowBox[{"1", "+", "2"}], "*", "3"}]
RawBoxes@RowBox[{"1", "+", "2", "*", "3"}]

These all display as

1 + 2 * 3

but you can see that they behave differently in the front end by Ctrl+. '-ing around "2" or looking at the Cell expression (Ctrl+shift+e).

If you send them to the kernel for evaluation, the second one will give 9, while the others give 7 as usually. So I assume the kernel will complete the parsing but will accept any pre-parsed boxes as-is.

The frontend does not parse everything:

1~f~2~f~3

is understood as

RowBox[{"1", "~", "f", "~", "2", "~", "f", "~", "3"}]

so Ctrl+. does not tell what will be going on after parsing in this case.

The kernel turns this into f[f[1, 2], 3].