Parsing Context Sensitive Language

ANTLR parses only grammars which are LL(*). It can't parse using grammars for full context-sensitive languages such as the example you provided. I think what Parr meant was that ANTLR can parse some languages that require some (left) context constraints.

In particular, one can use semantic predicates on "reduction actions" (we do this for GLR parsers used by our DMS Software Reengineering Toolkit but the idea is similar for ANTLR, I think) to inspect any data collected by the parser so far, either as ad hoc side effects of other semantic actions, or in a partially-built parse tree.

For our DMS-based DMS-based Fortran front end, there's a context-sensitive check to ensure that DO-loops are properly lined up. Consider:

 DO  20, I= ...
   DO 10, J = ...
       ...
20  CONTINUE
10  CONTINUE

From the point of view of the parser, the lexical stream looks like this:

DO  <number> , <variable> =  ...
    DO <number> , <variable> = ...
         ...
<number> CONTINUE
<number> CONTINUE

How can the parser then know which DO statement goes with which CONTINUE statement? (saying that each DO matches its closest CONTINUE won't work, because FORTRAN can share a CONTINUE statement with multiple DO-heads).

We use a semantic predicate "CheckMatchingNumbers" on the reduction for the following rule:

block = 'DO' <number> rest_of_do_head newline 
         block_of_statements
         <number> 'CONTINUE' newline ; CheckMatchingNumbers

to check that the number following the DO keyword, and the number following the CONTINUE keyword match. If the semantic predicate says they match, then a reduction for this rule succeeds and we've aligned the DO head with correct CONTINUE. If the predicate fails, then no reduction is proposed (and this rule is removed from candidates for parsing the local context); some other set of rules has to parse the text.

The actual rules and semantic predicates to handle FORTRAN nesting with shared continues is more complex than this but I think this makes the point.

What you want is full context-sensitive parsing engine. I know people have built them, but I don't know of any full implementations, and don't expect them to be fast.

I did follow Quinn Taylor Jackson's MetaS grammar system for awhile; it sounded like a practical attempt to come close.