Can I get an XML AST dump of C/C++ code with clang without using the compiler?

For your information, the XML printer has been removed from the 2.9 version by Douglas Gregor (responsible of CLang FrontEnd).

The issue was that the XML printer was lacking. A number of the AST nodes had never been implemented in the printer, as well as a number of the properties of some nodes, which led to an inaccurate representation of the source code.

Another point raised by Douglas was that the output should be suitable not for debugging CLang itself (which is what the -emit-ast is about) but for consumption by external tools. This requires the output to be stable from one version to another. Notably it should not be a 1-on-1 mapping of CLang internal, but rather translate the source code into standarized language.

Unless there is significant work on the printer (which requires volunteers) it will not be integrated back...


I've been working on my own version of extracting XML from Clang's AST. My code uses the Python bindings of libclang in order to traverse the AST.

My code is found at https://github.com/BentleyJOakes/PCX

Edit: I should add that it is quite incomplete in terms of producing the right source code tokens for each AST node. This unfortunately needs to be coded for each AST node type. However, the code should give a basis for anyone who wants to pursue this further.


Using a custom ASTDumper would do the job, without ofc compiling any source file. (stop clang in the frontend part). but you have to deal with all C and C++ code sources of llvm to accomplish that .