Generating CSV file for Excel, how to have a newline inside a value

Recently I had similar problem, I solved it by importing a HTML file, the baseline example would be like this:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
  <head>
    <style>
      <!--
      br {mso-data-placement:same-cell;}
      -->
    </style>
  </head>
  <body>
    <table>
      <tr>
        <td>first line<br/>second line</td>
        <td style="white-space:normal">first line<br/>second line</td>
      </tr>
    </table>
  </body>
</html>

I know, it is not a CSV, and might work differently for various versions of Excel, but I think it is worth a try.

I hope this helps ;-)


In Excel 365 while importing the file:

Data -> From Text/CSV: From Text/CSV

-> Select File > Transform Data:

Select File -> Transform Data

In the Power Query Editor, right hand side at "Query Settings", under APPLIED STEPS, on "Source" row, click the "Settings icon"

Source settings icon

-> In the line break dropdown select Ignore line breaks inside quotes.

Ignore quoted line breaks

Then press OK -> File -> Close & Load


You should have space characters at the start of fields ONLY where the space characters are part of the data. Excel will not strip off leading spaces. You will get unwanted spaces in your headings and data fields. Worse, the " that should be "protecting" that line-break in the third column will be ignored because it is not at the start of the field.

If you have non-ASCII characters (encoded in UTF-8) in the file, you should have a UTF-8 BOM (3 bytes, hex EF BB BF) at the start of the file. Otherwise Excel will interpret the data according to your locale's default encoding (e.g. cp1252) instead of utf-8, and your non-ASCII characters will be trashed.

Following comments apply to Excel 2003, 2007 and 2013; not tested on Excel 2000

If you open the file by double-clicking on its name in Windows Explorer, everything works OK.

If you open it from within Excel, the results vary:

  1. You have only ASCII characters in the file (and no BOM): works.
  2. You have non-ASCII characters (encoded in UTF-8) in the file, with a UTF-8 BOM at the start: it recognises that your data is encoded in UTF-8 but it ignores the csv extension and drops you into the Text Import not-a-Wizard, unfortunately with the result that you get the line-break problem.

Options include:

  1. Train the users not to open the files from within Excel :-(
  2. Consider writing an XLS file directly ... there are packages/libraries available for doing that in Python/Perl/PHP/.NET/etc

After lots of tweaking, here's a configuration that works generating files on Linux, reading on Windows+Excel, though the embedded newline format is not according to the standard:

  • Newlines within a field need to be \n (and obviously quoted in double quotes)
  • End of record: \r\n
  • Make sure that you don't start a field with equals, otherwise it gets treated as a formula and truncated

In Perl, I used Text::CSV to do this as follows:

use Text::CSV;

open my $FO, ">:encoding(utf8)", $filename or die "Cannot create $filename: $!";
my $csv = Text::CSV->new({ binary => 1, eol => "\r\n" });

#for each row...:
$csv -> print ($FO, \@row);

Tags:

Csv

Excel

Newline