Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

etc.

Besides, the use of "readline" as a variable name may or may not get you into problems.


It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:

cut -i -d' ' -f 2 data.file

If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.

You need to pipe the output of awk into your loop, though:

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.

With bash, you can use process substitution:

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.

The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.


The job of replacing multiple delimiters with just one is left to tr:

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.

The manual states:

-s, --squeeze-repeats
          replace each sequence  of  a  repeated  character  that  is
          listed  in the last specified SET, with a single occurrence
          of that character