Can field separator in awk encompass multiple characters?

What's being talked around here is that the Field Separator isn't just limited to being multiple characters but can actually be a full-blown regex.

To wit: This strips out the header and surrounding tags from an XML fragment. Note that tags are well-formed, but different.

bash-3.2$ more xml_example 
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
                  http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
<url>
<loc>http://www.foo.com/about.html</loc>
<lastmod>2006-05-15T13:43:37Z</lastmod>
<priority>0.5000</priority>
</url>
<url>
<loc>http://www.foo.com/articles/articles.html</loc>
<lastmod>2006-06-20T23:03:36Z</lastmod>
<priority>0.5000</priority>
</url>

Now we apply the awk script to print out the middle field, using a regex as the field separator:

bash-3.2$ awk -F"<(/?)[a-z]+>" '{print $2}' <xml_example




http://www.foo.com/about.html
2006-05-15T13:43:37Z
0.5000


http://www.foo.com/articles/articles.html
2006-06-20T23:03:36Z
0.5000

bash-3.2$

The blank lines are from where a tag was the only thing on that line, so there is no $2 to print. This is actually really powerful because it means that you can not only use fixed patterns with multiple characters but the full power of regular expressions as well in your field separator.


yes, FS could be multi-characters. see the below test with your example:

kent$  echo '"School","College","City"'|awk -F'","|^"|"$' '{for(i=1;i<=NF;i++){if($i)print $i}}'
School
College
City

Tags:

Awk

Gawk