Should we do nested goroutines?

The answer depends on how processor-intensive the operation on each line is.

If the line operation is short-lived, definitely don't bother to spawn a goroutine for each line.

If it's expensive (think ~5 secs or more), proceed with caution. You may run out of memory. As of Go 1.4, spawning a goroutine allocates a 2048 byte stack. For 2 million lines, you could allocate over 2GB of RAM for the goroutine stacks alone. Consider whether it's worth allocating this memory.

In short, you will probably get the best results with the following setup:

for file in folder:
    go process_file(file)

If the number of files exceeds the number of CPUs, you're likely to have enough concurrency to mask the disk I/O latency involved in reading the files from disk.


If you go through with the architecture you've specified, you have a good chance of running out of CPU/Mem/etc because you're going to be creating an arbitrary amount of workers. I suggest, instead go with an architecture that allows you to throttle via channels. For example:

In your main process feed the files into a channel:

for _, file := range folder {
  fileChan <- file
}

then in another goroutine break the files into lines and feed those into a channel:

for {
  select{
  case file := <-fileChan
    for _, line := range file {
      lineChan <- line
    }
  }
}

then in a 3rd goroutine pop out the lines and do what you will with them:

for {
  select{
  case line := <-lineChan:
    // process the line
  }
}

The main advantage to this is that you can create as many or as few go routines as your system can handle and pass them all the same channels and whichever go routine gets to the channel first will just handle it, so you're able to throttle the amount of resources you're using.

Here is a working example: http://play.golang.org/p/-Qjd0sTtyP