Substitute every comma outside of double quotes for a pipe

Using csvkit:

$ csvformat -D '|' file.csv
John|Tonny|345.3435,23|56th Street

The tools in csvkit knows how to handle the intricacies of CVS files, and here we're using csvformat to replace the delimiting commas with | correctly. The output fields will be quoted as needed.

Example:

$ cat file.csv
John,Tonny,"345.3435,23",56th Street
The | factory,Ltd.,"0,0",meep meep

$ csvformat -D '|' file.csv
John|Tonny|345.3435,23|56th Street
"The | factory"|Ltd.|0,0|meep meep

If your sed supports the -E option (-r in some implementations):

sed -Ee :1 -e 's/^(([^",]|"[^"]*")*),/\1|/;t1' < file

The

:label
   s/pattern/replacement/
t label

Is a very common sed idiom. It keeps doing the same substitution in a loop as long as it's successful.

Here, we're substituting the leading part of the line made of 0 or more quoted strings or characters other that " and , (captured in \1) followed by a , with that \1 capture and a |, so on your sample that means:

  • John,Tonny,"345.3435,23",56th Street -> John|Tonny,"345.3435,23",56th Street
  • John|Tonny,"345.3435,23",56th Street -> John|Tonny|"345.3435,23",56th Street
  • John|Tonny|"345.3435,23",56th Street -> John|Tonny|"345.3435,23"|56th Street
  • and we stop here as the pattern doesn't match any more on that.

With perl, you could do it with one substitution with the g flag with:

perl -pe 's{("[^"]*"|[^",]+)|,}{$1 // "|"}ge'

Here, assuming quotes are balanced in the input, the pattern would match all the input, breaking it up in either:

  • quoted string
  • sequences of characters other than , or "
  • a comma

And only when the matched string is a comma (when $1 is not defined in the replacement part), replace it with a |.


With perl

perl -MText::CSV -lne '
  BEGIN { $p = Text::CSV->new() } 
  print join "|", $p->fields() if $p->parse($_)
' file.csv
John|Tonny|345.3435,23|56th Street