Using Boost to read and write XML files

Boost uses RapidXML as described in chapter XML Parser of page How to Populate a Property Tree:

Unfortunately, there is no XML parser in Boost as of the time of this writing. The library therefore contains the fast and tiny RapidXML parser (currently in version 1.13) to provide XML parsing support. RapidXML does not fully support the XML standard; it is not capable of parsing DTDs and therefore cannot do full entity substitution.

Please also refer to the XML boost tutorial.

As the OP wants a "simple way to use boost to read and write xml files", I provide below a very basic example:

<main>
    <owner>Matt</owner>
    <cats>
        <cat>Scarface Max</cat>
        <cat>Moose</cat>
        <cat>Snowball</cat>
        <cat>Powerball</cat>
        <cat>Miss Pudge</cat>
        <cat>Needlenose</cat>
        <cat>Sweety Pie</cat>
        <cat>Peacey</cat>
        <cat>Funnyface</cat>
    </cats>
</main>

(cat names are from Matt Mahoney's homepage)

The corresponding structure in C++:

struct Catowner
{
    std::string           owner;
    std::set<std::string> cats;
};

read_xml() usage:

#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>

Catowner load(const std::string &file)
{
    boost::property_tree::ptree pt;
    read_xml(file, pt);

    Catowner co;

    co.owner = pt.get<std::string>("main.owner");

    BOOST_FOREACH(
       boost::property_tree::ptree::value_type &v,
       pt.get_child("main.cats"))
       co.cats.insert(v.second.data());

    return co;
}

write_xml() usage:

void save(const Catowner &co, const std::string &file)
{
   boost::property_tree::ptree pt;

   pt.put("main.owner", co.owner);

   BOOST_FOREACH(
      const std::string &name, co.cats)
      pt.add("main.cats.cat", name);

   write_xml(file, pt);
}

TinyXML is probably a good choice. As for Boost:

There is the Property_Tree library in the Boost Repository. It has been accepted, but support seems to be lacking at the moment (EDIT: Property_Tree is now part of Boost since version 1.41, read the documentation regarding its XML functionality).

Daniel Nuffer has implemented an xml parser for Boost Spirit.


You should Try pugixml Light-weight, simple and fast XML parser for C++

The nicest thing about pugixml is the XPath support, which TinyXML and RapidXML lack.

Quoting RapidXML's author "I would like to thank Arseny Kapoulkine for his work on pugixml, which was an inspiration for this project" and "5% - 30% faster than pugixml, the fastest XML parser I know of" He had tested against version 0.3 of pugixml, which has reached recently version 0.42.

Here is an excerpt from pugixml documentation:

The main features are:

  • low memory consumption and fragmentation (the win over pugxml is ~1.3 times, TinyXML - ~2.5 times, Xerces (DOM) - ~4.3 times 1). Exact numbers can be seen in Comparison with existing parsers section.
  • extremely high parsing speed (the win over pugxml is ~6 times, TinyXML - ~10 times, Xerces-DOM - ~17.6 times 1
  • extremely high parsing speed (well, I'm repeating myself, but it's so fast, that it outperforms Expat by 2.8 times on test XML) 2
  • more or less standard-conformant (it will parse any standard-compliant file correctly, with the exception of DTD related issues)
  • pretty much error-ignorant (it will not choke on something like You & Me, like expat will; it will parse files with data in wrong encoding; and so on)
  • clean interface (a heavily refactored pugxml's one)
  • more or less Unicode-aware (actually, it assumes UTF-8 encoding of the input data, though it will readily work with ANSI - no UTF-16 for now (see Future work), with helper conversion functions (UTF-8 <-> UTF-16/32 (whatever is the default for std::wstring & wchar_t))
  • fully standard compliant C++ code (approved by Comeau strict mode); the library is multiplatform (see reference for platforms list)
  • high flexibility. You can control many aspects of file parsing and DOM tree building via parsing options.

Okay, you might ask - what's the catch? Everything is so cute - it's small, fast, robust, clean solution for parsing XML. What is missing? Ok, we are fair developers - so here is a misfeature list:

  • memory consumption. It beats every DOM-based parser that I know of - but when SAX parser comes, there is no chance. You can't process a 2 Gb XML file with less than 4 Gb of memory - and do it fast. Though pugixml behaves better, than all other DOM-based parser, so if you're stuck with DOM, it's not a problem.
  • memory consumption. Ok, I'm repeating myself. Again. When other parsers will allow you to provide XML file in a constant storage (or even as a memory mapped area), pugixml will not. So you'll have to copy the entire data into a non-constant storage. Moreover, it should persist during the parser's lifetime (the reasons for that and more about lifetimes is written below). Again, if you're ok with DOM - it should not be a problem, because the overall memory consumption is less (well, though you'll need a contiguous chunk of memory, which can be a problem).
  • lack of validation, DTD processing, XML namespaces, proper handling of encoding. If you need those - go take MSXML or XercesC or anything like that.

Tags:

C++

Xml

Boost