Turning separate lines into a comma separated list with quoted entries

You can add quotes with sed and then merge lines with paste, like that:

sed 's/^\|$/"/g'|paste -sd, -

If you are running a GNU coreutils based system (i.e. Linux), you can omit the trailing '-'.

If you input data has DOS-style line endings (as @phk suggested), you can modify the command as follows:

sed 's/\r//;s/^\|$/"/g'|paste -sd, -

Using awk:
awk 'BEGIN { ORS="" } { print p"'"'"'"$0"'"'"'"; p=", " } END { print "\n" }' /path/to/list
Alternative with less shell escaping and therefore more readable:
awk 'BEGIN { ORS="" } { print p"\047"$0"\047"; p=", " } END { print "\n" }' /path/to/list
Output:
'd3heatmap', 'data.table', 'ggplot2', 'htmltools', 'htmlwidgets', 'metricsgraphics', 'networkD3', 'plotly', 'reshape2', 'scales', 'stringr'
Explanation:

The awk script itself without all the escaping is BEGIN { ORS="" } { print p"'"$0"'"; p=", " } END { print "\n" }. After printing the first entry the variable p is set (before that it's like an empty string). With this variable p every entry (or in awk-speak: record) is prefixed and additionally printed with single quotes around it. The awk output record separator variable ORS is not needed (since the prefix is doing it for you) so it is set to be empty at the BEGINing. Oh and we might our file to END with a newline (e.g. so it works with further text-processing tools); should this not be needed the part with END and everything after it (inside the single quotes) can be removed.

Note

If you have Windows/DOS-style line endings (\r\n), you have to convert them to UNIX style (\n) first. To do this you can put tr -d '\015' at the beginning of your pipeline:

tr -d '\015' < /path/to/input.list | awk […] > /path/to/output

(Assuming you don't have any use for \rs in your file. Very safe assumption here.)

Alternatively, simply run dos2unix /path/to/input.list once to convert the file in-place.


As @don_crissti's linked answer shows, the paste option borders on incredibly fast -- the linux kernel's piping is more efficient than I would have believed if I hadn't just now tried it. Remarkably, if you can be happy with a single comma separating your list items rather than a comma+space, a paste pipeline

(paste -d\' /dev/null - /dev/null | paste -sd, -) <input

is faster than even a reasonable flex program(!)

%option 8bit main fast
%%
.*  { printf("'%s'",yytext); }
\n/(.|\n) { printf(", "); }

But if just decent performance is acceptable (and if you're not running a stress test, you won't be able to measure any constant-factor differences, they're all instant) and you want both flexibility with your separators and reasonable one-liner-y-ness,

sed "s/.*/'&'/;H;1h;"'$!d;x;s/\n/, /g'

is your ticket. Yes, it looks like line noise, but the H;1h;$!d;x idiom is the right way to slurp up everything, once you can recognize that the whole thing gets actually easy to read, it's s/.*/'&'/ followed by a slurp and a s/\n/, /g.


edit: bordering on the absurd, it's fairly easy to get flex to beat everything else hollow, just tell stdio you don't need the builtin multithread/signalhandler sync:

%option 8bit main fast
%%
.+  { putchar_unlocked('\'');
      fwrite_unlocked(yytext,yyleng,1,stdout);
      putchar_unlocked('\''); }
\n/(.|\n) { fwrite_unlocked(", ",2,1,stdout); }

and under stress that's 2-3x quicker than the paste pipelines, which are themselves at least 5x quicker than everything else.

Tags:

Linux

Csv

Sed

Tr