Could sed or awk use NUL character as record separator?

By default, the record separator is the newline character, defining a record to be a single line of text. You can use a different character by changing the built-in variable RS. The value of RS is a string that says how to separate records; the default value is \n, the string containing just a newline character.

 awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list

Since version 4.2.2, GNU sed has the -z or --null-data option to do exactly this. Eg:

sed -z 's/old/new' null_separated_infile

Yes, gawk can do this, set the record separator to \0. For example the command

gawk 'BEGIN { RS="\0"; FS="=" } $1=="LD_PRELOAD" { print $2 }' </proc/$(pidof mysqld)/environ

Will print out the value of the LD_PRELOAD variable:

/usr/lib/x86_64-linux-gnu/libjemalloc.so.1

The /proc/$PID/environ file is a NUL separated list of environment variables. I'm using it as an example, as it's easy to try on a linux system.

The BEGIN part sets the record separator to \0 and the field separator to = because I also want to extract the part after = based on the part before =.

The $1=="LD_PRELOAD" runs the block if the first field has the key I'm interested in.

The print $2 block prints out the string after =.


But mawk cannot parse input files separated with NUL. This is documented in man mawk:

BUGS
       mawk cannot handle ascii NUL \0 in the source or data files.

mawk will stop reading the input after the first \0 character.


You can also use xargs to handle NUL separated input, a bit non-intuitively, like this:

xargs -0 -n1 </proc/$$/environ

xargs is using echo as the default comand. -0 sets the input to be NUL separated. -n1 sets the max arguments to echo to be 1, this way the output will be separated by newlines.


And as Graeme's answer shows, sed can do this too.


Using sed for removing the null characters -

sed 's/\x0/ /g' infile > outfile

or make in-file substitution by doing (this will make backup of your original file and overwrite your original file with substitutions).

sed -i.bak 's/\x0/ /g' infile

Using tr:

tr -d "\000" < infile > outfile

Tags:

Awk

Sed

Nul