CSV separator auto-detection in Javascript

A possible algorithm for getting the likely separator(s) is pretty simple, and assumes the data is well-formed:

  1. For every delimiter,
    1. For every line,
      1. Split the line by the delimiter, check the length.
      2. If its length is not equal to the last line's length, this is not a valid delimiter.

Proof of concept (doesn't handle quoted fields):

function guessDelimiters (text, possibleDelimiters) {
    return possibleDelimiters.filter(weedOut);

    function weedOut (delimiter) {
        var cache = -1;
        return text.split('\n').every(checkLength);

        function checkLength (line) {
            if (!line) {
                return true;
            }

            var length = line.split(delimiter).length;
            if (cache < 0) {
                cache = length;
            }
            return cache === length && length > 1;
        }
    }
}

The length > 1 check is to make sure the split didn't just return the whole line. Note that this returns an array of possible delimiters - if there's more than one item, you have an ambiguity problem.


Another solution is using the detect method from the csv-string package:

detect(input : String) : String Detects the best separator.

    var CSV = require('csv-string');

    console.log(CSV.detect('a,b,c')); // OUTPUT : ","
    console.log(CSV.detect('a;b;c')); // OUTPUT : ";"
    console.log(CSV.detect('a|b|c')); // OUTPUT : "|"
    console.log(CSV.detect('a\tb\tc'));// OUTPUT : "\t"