C++ vs. D , Ada and Eiffel (horrible error messages with templates)

The article Generic Programming outlines many of the pros and cons of generics in several languages, including Ada in particular. Although lacking template specialization, all Ada generic instances are "equivalent to the instance declaration…immediately followed by the instance body". As a practical matter, error messages tend to occur at compile-time, and they typically represent familiar violations of type-safety.

In general I found Ada compiler error messages for generics really not significantly more difficult to read than any other Ada compiler error messages.

C++ template error messages, on the other hand, are notorious for being error novels. The main difference I think is the way C++ does template instantiation. The thing is, C++ templates are much more flexible than Ada generics. It is so flexible, it is almost like a macro preprocessor. The clever folks in Boost have used this to implement things like lambdas and even whole other languages.

Because of that flexibility, the entire template hierarchy basically has to be compiled anew every time its particular permutation of template parameters is first encountered. Thus issues that resolve down to incompatibilities several layers down a API end up being presented to the poor API client to decipher.

In Ada, Generics are actually strongly typed, and provide full information hiding to the client, just like normal packages and subroutines do. So if you do get an error message, it is typically just referencing the one generic you are trying to instatiate, not the entire hierarchy used to implement it.

So yes, C++ template error messages are way worse than Ada's.

Now debugging is a different story entirely...

The problem, at heart, is that error recovery is difficult, whatever the context.

And when you factor in C and C++ horrid grammars, you can only wonder that error messages are not worse than that! I am afraid that the C grammar has been designed by people who didn't have a clue about the essential properties of a grammar, one of them being that the less reliance on the context the better and the other being that you should strive to make it as unambiguous as possible.

Let us illustrate a common error: forgetting a semi-colon.

struct CType {
  int a;
  char b;
bar() { /**/ }

Okay so this is wrong, where should the missing semi-colon go ? Well unfortunately it's ambiguous, it can go either before or after foo because:

  • C considers it normal to declare a variable in stride after defining a struct
  • C considers it normal not to specify a return type for a function (in which case it defaults to int)

If we reason about, we could see that:

  • if foo names a type, then it belongs to the function declaration
  • if not, it probably denotes a variable... unless of course we made a typo and it was meant to be written fool, which happens to be a type :/

As you can see, error recovery is downright difficult, because we need to infer what the writer meant, and the grammar is far from being receptive. It is not impossible though, and most errors can indeed be diagnosed more or less correctly, and even recovered from... it just takes considerable effort.

It seems that people working on gcc are more interested in producing fast code (and I mean fast, search for the latest benchmarks on gcc 4.6) and adding interesting features (gcc already implement most - if not all - of C++0x) than producing easy to read error messages. Can you blame them ? I can't.

Fortunately there are people who think that accurate error reporting and good error recovery are a very worthy goal, and some of those have been working on CLang for quite a bit, and they are continuing to do so.

Some nice features, off the top of my head:

  • Terse but complete error messages, which include the source ranges to expose exactly where the error emanated from
  • Fix-It notes when it's obvious what was meant
  • In which case the compiler parses the rest of the file as if the fix had been there already, instead of spewing lines upon lines of gibberish
  • (recent) avoid including the include stack for notes, to cut out on the cruft
  • (recent) trying only to expose the template parameter types that the developper actually wrote, and preserving typedefs (thus talking about std::vector<Name> instead of std::vector<std::basic_string<char, std::allocator<char>>, std::allocator<std::basic_string<char, std::allocator<char>> > which makes all the difference)
  • (recent) recovering correctly in case of a missing template in case it's missing in a call to a template method from within another template method

But each of those has required several hours to days of work.

They certainly didn't come for free.

Now, concepts should have (normally) made our lives easier. But they were mostly untested and so it was deemed preferable to remove them from the draft. I must say I am glad for this. Given C++ relative inertia, it's better not to include features that haven't been thoroughly revised, and the concept maps didn't really thrilled me. Neither did they thrilled Bjarne or Herb it seems, as they said that they would be rethinking Concepts from scratch for the next standard.