CsvHelper : How to detect the Delimiter from the given csv file

Since I had to deal with the possibility that, depending on the localization settings of the user, the CSV file (Saved in MS Excel) could contain a different delimiter, I ended up with the following approach :

public static string DetectDelimiter(StreamReader reader)
{
    // assume one of following delimiters
    var possibleDelimiters =  new List<string> {",",";","\t","|"};

    var headerLine = reader.ReadLine();

    // reset the reader to initial position for outside reuse
    // Eg. Csv helper won't find header line, because it has been read in the Reader
    reader.BaseStream.Position = 0;
    reader.DiscardBufferedData();

    foreach (var possibleDelimiter in possibleDelimiters)
    {
        if (headerLine.Contains(possibleDelimiter))
        {
            return possibleDelimiter;
        }
    }

    return possibleDelimiters[0];
}

I also needed to reset the reader's read position, since it was the same instance I used In the CsvReader constructor.

The usage was then as follows:

using (var textReader = new StreamReader(memoryStream))
{
    var delimiter = DetectDelimiter(textReader);

    using (var csv = new CsvReader(textReader))
    {
        csv.Configuration.Delimiter = delimiter;

        ... rest of the csv reader process

    }
}

I found this piece of code in this site

public static char Detect(TextReader reader, int rowCount, IList<char> separators)
{
    IList<int> separatorsCount = new int[separators.Count];

    int character;

    int row = 0;

    bool quoted = false;
    bool firstChar = true;

    while (row < rowCount)
    {
        character = reader.Read();

        switch (character)
        {
            case '"':
                if (quoted)
                {
                    if (reader.Peek() != '"') // Value is quoted and 
            // current character is " and next character is not ".
                        quoted = false;
                    else
                        reader.Read(); // Value is quoted and current and 
                // next characters are "" - read (skip) peeked qoute.
                }
                else
                {
                    if (firstChar)  // Set value as quoted only if this quote is the 
                // first char in the value.
                        quoted = true;
                }
                break;
            case '\n':
                if (!quoted)
                {
                    ++row;
                    firstChar = true;
                    continue;
                }
                break;
            case -1:
                row = rowCount;
                break;
            default:
                if (!quoted)
                {
                    int index = separators.IndexOf((char)character);
                    if (index != -1)
                    {
                        ++separatorsCount[index];
                        firstChar = true;
                        continue;
                    }
                }
                break;
        }

        if (firstChar)
            firstChar = false;
    }

    int maxCount = separatorsCount.Max();

    return maxCount == 0 ? '\0' : separators[separatorsCount.IndexOf(maxCount)];
}

With separators is the possible separators that you can have.

Hope that help :)


CSV is Comma Separated Values. I don't think you can reliably detect if there is a different character used a separator. If there is a header row, then you might be able to count on it.

You should know the separator that is used. You should be able to see it when opening the file. If the source of the files gives you a different separator each time and is not reliable, then I'm sorry. ;)

If you just want to parse using a different delimiter, then you can set csv.Configuration.Delimiter. http://joshclose.github.io/CsvHelper/#configuration-delimiter

Tags:

Csvhelper