how to ask cat (and maybe grep ?) not to take into account a new line when it's inside double quotes?

Using csvgrep from the csvkit package to pull out all records that has a codeRegion value containing the string 01:

csvgrep -c codeRegion -m 01 file.csv

This is using a proper CSV parser, so there will no issues with newlines or commas in properly quoted fields.

The -c option selects the column that we'd like to investigate, by number or by name, and -m designates the string to match with. One could also use -r to match with a regular expression, e.g. -r '^01$' to avoid matching strings where 01 is a substring (as in 011). See csvgrep --help.


awk '/^01/||n%2{print;n+=gsub(/"/,"&")}' file

For each line,

  • /^01/||n%2 If line begins with 01 or n (initally zero) is odd,
    • print Print it
    • n+=gsub(/"/,"&") increment n by the return value of the gsub function.
      This replaces every double-quote /"/ with itself "&". That would be pointless, indeed, but it also returns the number of substitutions made, so it is a way of counting the number of double-quotes in the line.

Notice that if the n is odd (n%2) the line does not have a closing double-quote, so it keeps printing until n is even, regardless of whether there is a /^01/ match on the next lines.

A side-by-side diff for you:

$ diff -yW 30 <(cat file) <(awk '/^01/||n%2{print;n+=gsub(/"/,"&")}' file)
04,xde        <
01,abc"         01,abc"
cd              cd
as"             as"
02,dsad       <
03,1ad"       <
01,as,"as       01,as,"as
us"             us"
02,s          <
01,a            01,a