Filtering illegal XML characters in Java

This page includes a Java method for stripping out invalid XML characters by testing whether each character is within spec, though it doesn't check for highly discouraged characters

Incidentally, escaping the characters is not a solution since the XML 1.0 and 1.1 specs do not allow the invalid characters in escaped form either.


It's not trivial to find out all the invalid chars for XML. You need to call or reimplement the XMLChar.isInvalid() from Xerces,

http://kickjava.com/src/org/apache/xerces/util/XMLChar.java.htm

Tags:

Java

Xml

Unicode