How do I read in a large flat file

It seems to me this variant of readLines is shorter and faster than suggested peterSO

func readLines(filename string) (map[int]string, error) {
    lines := make(map[int]string)

    data, err := ioutil.ReadFile(filename)
    if err != nil {
        return nil, err
    }

    for n, line := range strings.Split(string(data), "\n") {
        lines[n] = line
    }

    return lines, nil
}

It's not clear that it's necessary to read in all the lines before parsing them and inserting them into a database. Try to avoid that.

You have a small file: "a flat file that has 339276 line of text in it for a size of 62.1 MB." For example,

package main

import (
    "bytes"
    "fmt"
    "io"
    "io/ioutil"
)

func readLines(filename string) ([]string, error) {
    var lines []string
    file, err := ioutil.ReadFile(filename)
    if err != nil {
        return lines, err
    }
    buf := bytes.NewBuffer(file)
    for {
        line, err := buf.ReadString('\n')
        if len(line) == 0 {
            if err != nil {
                if err == io.EOF {
                    break
                }
                return lines, err
            }
        }
        lines = append(lines, line)
        if err != nil && err != io.EOF {
            return lines, err
        }
    }
    return lines, nil
}

func main() {
    // a flat file that has 339276 lines of text in it for a size of 62.1 MB
    filename := "flat.file"
    lines, err := readLines(filename)
    fmt.Println(len(lines))
    if err != nil {
        fmt.Println(err)
        return
    }
}

bufio.Scan() and bufio.Text() in a loop perfectly works for me on a files with much larger size, so I suppose you have lines exceeded buffer capacity. Then

  • check your line ending
  • and which Go version you use path, err :=r.ReadLine("\n") // 0x0A separator = newline? Looks like func (b *bufio.Reader) ReadLine() (line []byte, isPrefix bool, err error) has return value isPrefix specifically for your use case http://golang.org/pkg/bufio/#Reader.ReadLine

Tags:

Buffer

Go