Replacing underscore by comma and removing double quotes in CSV

Far simpler way is to use tr

$ tr '_' ',' < input.csv | tr -d '"'                  
1,1,0,0,76
1,1,0,0,77
1,1,0,0,78

The way this works is that tr takes two arguments - set of characters to be replaced, and their replacement. In this case we only have sets of 1 character. We redirect input.csv input tr's stdin stream via < shell operator, and pipe the resulting output to tr -d '"' to delete double quotes.

But awk can do it too.

$ cat input.csv
"1_1_0_0_76"
"1_1_0_0_77"
"1_1_0_0_78"
$ awk '{gsub(/_/,",");gsub(/\"/,"")};1' input.csv
1,1,0,0,76
1,1,0,0,77
1,1,0,0,78

The way this works is slightly different: awk reads each file line by line, each in-line script being /Pattern match/{ codeblock}/Another pattern/{code block for this pattern}. Here we don't have a pattern, so it means to execute codeblock for each line. gsub() function is used for global substitution within a line, thus we use it to replace underscores with commas, and double quotes with a null string (effectively deleting the character). The 1 is in place of the pattern match with missing code block, which defaults simply to printing the line; in other words the codeblock with gsub() does the job and 1 prints the result.

Use the shell redirection (>) to send output to a new file:

 awk '{gsub(/_/,",");gsub(/\"/,"")};1' input.csv > output.csv

Just as an alternative, you can also use this sed command:

$ sed -e 's/_/,/g' -e 's/"//g' input.csv
1,1,0,0,76
1,1,0,0,77
1,1,0,0,78

Perl, the "Swiss army chainsaw" of command-line text processing, can also do this. The syntax is (not coincidentally) quite similar to the tr and sed examples:

perl -pe 'tr/_"/,/d' input.csv > result.csv

or:

perl -pe 's/_/,/g; s/"//g' input.csv > result.csv

But honestly, if you don't want to take the time to learn a new programming language (which is really what awk, Perl and sed and other tools like them are) just for this basic task, you could just as well do it in any text editor that supports search and replace:

Open the CSV file in your favorite text editor (such as gedit, kate, mousepad, etc.; even plain old Notepad or Wordpad on Windows can do this).
Select "Search and Replace" from the menu (typically found under "Edit", if there isn't a separate "Search" menu).
Enter _ into the search box, and , into the replacement box.
Click "Replace All".
Repeat with " in the search box and nothing in the replacement box.
Save the file.

Now, if you need to do this for 100 or 1000 files instead of just one, then learning a new command-line tool starts to make sense. And, of course, once you do know how to use Perl or sed or whatever, then you'll save a lot of time and effort with similar tasks later. But for just a one-off job that you don't expect to need to do again, sometimes a basic interactive tool like a text editor is the simplest solution.

Replacing underscore by comma and removing double quotes in CSV

Tags:

Linux

Csv

Awk

Sed

Text Formatting

Related

Recent Posts