Grep tool for XML

XMLStarlet (Wikipedia) is a command line tool which comes close to grep.  It is open source software (MIT license) and works well on Linux and Windows.

The XMLStarlet website describes it as follows.

XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.

The Debian/Ubuntu package is named xmlstarlet. But beware: Contrary to what the manpage says, the binary is named xmlstarlet in Debian/Ubuntu and not xml.

There are also Windows binaries on SourceForge.

For a nice little introduction, see IBM's Start working with XMLStarlet.


The XPath syntax in various languages is best for finding things in xml. In fact one of the tools recommended by the makers of xgrep is basically a Perl XML parser that accepts XPath input.


A tool that works under Linux is xml_grep. It fully understands XML and is not a line-by-line tool.

xml_grep is included as a stand-alone tool in the XML::Twig package. The grepping functionality is quite powerful as it supports XPath specifications.

Sample command-line (extracting posts edited after the middle of February from the triology Data Dump):

xml_grep -p --cond="row[@LastEditDate>'2010-02-14']"  posts.xml  > lateEditedPosts.xml

Installation is easy. Either

  • sudo cpan -i "XML::Twig", as described in the xml_grep cookbook referenced below.

or

  • Download http://xmltwig.org/xmltwig/XML-Twig-3.34.tar.gz or http://search.cpan.org/CPAN/authors/id/M/MI/MIROD/XML-Twig-3.34.tar.gz. E.g. wget http://search.cpan.org/CPAN/authors/id/M/MI/MIROD/XML-Twig-3.34.tar.gz

  • Extract: gunzip XML-Twig-3.34.tar.gz; tar -xvf XML-Twig-3.34.tar

  • Go to into folder: cd XML-Twig-3.34

  • Install: perl Makefile.PL -y. Then make, make test and sudo make install.


More information:

The best introduction I have found for xml_grep is xml_grep cookbook, about two pages. Other:

  • Man page for xml_grep.
  • Real home page for XML::Twig.