Return xml sequences where an attribute doesn't contain a specific character

An easy way to do this is to use the nodes method to get right to the address attribute and check for your @ sign.

The problem with the way you're looking now is that it's only checking that any email address has an @ in it. Parsing the XML nodes out lets you check individual emails for it.

DECLARE @x XML
    = '<xml>
<customer name="Max"><email address="[email protected]" /></customer>
<customer name="Erik"><email address="[email protected]" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';


SELECT x.c.value('@address', 'VARCHAR(100)') AS [email]
FROM   @x.nodes('/xml/customer/email') AS x(c)
WHERE  x.c.exist('@address[contains(., "@")]') = 0;

If you need to query an actual table with an XML column like this, you'd just CROSS APPLY the nodes method like thusly:

SELECT x.c.value('@address', 'VARCHAR(100)') AS [email]
FROM @x_table AS xt
CROSS APPLY xt.x.nodes('/xml/customer/email') AS x(c)
WHERE  x.c.exist('@address[contains(., "@")]') = 0;

If you want to bring all the <customer>...</customer> XML for that "row" back, you can walk the axis back. Just be aware that walking back can make performance a bit woogy for large XML blocks.

SELECT x.c.query('..')
FROM @x_table AS xt
CROSS APPLY xt.x.nodes('/xml/customer/email') AS x(c)
WHERE  x.c.exist('@address[contains(., "@")]') = 0;

Another way of doing it is:

SELECT @x.query('/xml/customer[email/@address[not(contains(., "@"))]]') answer

Moving the square brackets to wrap around the email node effectively make that the WHERE clause applied to the customer node. Translating this XQuery to English looks like:

Get me all xml/customer nodes with an email node which has an address attribute which does not contain the @ symbol


You were oh so close. You were definitely on the right track with using the .query() function and using the contains XQuery function. What you got wrong was:

  1. Putting the = False outside of the [...] (meaning, it was not part of the contains() expression)
  2. Using the word False instead of the function false()
  3. Not specifying the parent node by adding /.. to the end of the path (so that the result will include the <customer> element and not just the <email> element)

Correcting those three things results in the following XQuery expression that gets you what you are wanting:

'/xml/customer/email[contains(@address, "@") = false()]/..'

Putting that into your original example from the question gives you:

DECLARE @x XML = '<xml>
<customer name="Max"><email address="[email protected]" /></customer>
<customer name="Erik"><email address="[email protected]" /></customer>
<customer name="Brent"><email address="brentcom" /></customer>
</xml>';

SELECT
@x.query('/xml/customer/email[contains(@address, "@")]/..') AS [WithValidEmail],
@x.query('/xml/customer/email[contains(@address, "@")=false()]/..') AS [WithInvalidEmail;

That query returns the following result set of a single row with two XML fields:

WithValidEmail                            |     WithInvalidEmail
<customer name="Max">                     |     <customer name="Brent">
  <email address="[email protected]" />          |       <email address="brentcom" />
</customer>                               |     </customer>
<customer name="Erik">                    |
  <email address="[email protected]" />   |
</customer>                               |

This is probably more efficient than breaking the document out with the .nodes() function since it can parse the XML in a single shot and not need to start and stop the parser per each node.

The other benefit of keeping it within .query() is that you get a single XML document returned. So, if you receive an XML document / value containing multiple nodes worth of stuff, you can maintain the scalar value approach of it being a single entity without having to reconstruct the resulting nodes back into a document again. This also lets you use it in a subquery / CTE without changing the number of expected rows being returned.