tm_map has parallel::mclapply error in R 3.0.1 on Mac

I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:

stemDocument(crude[[1]])

For me, I got an error:

Error in loadNamespace(name) : there is no package called ‘SnowballC’

So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.


I just ran into this. It took me a bit of digging but I found out what was happening.

  1. I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'

  2. Running this produced the error

    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

  1. It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type
    > getOption("mc.cores", 2L)
    [1] 2
    >

  1. Aha moment! Tell the 'tm_map' call to only use one core!
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
    Error in match.fun(FUN) : object 'asPlainTextDocument' not found
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code
    > 

So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!

So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.


I found an answer to this that was successful for me in this question: Charles Copley, in his answer, indicates he thinks the new tm package requires lazy = TRUE to be explicitly defined.

So, your code would look like this

library(tm)
data('crude')
tm_map(crude, stemDocument, lazy = TRUE)

I also tried it without SnowballC to see if it was a combination of those two answers. It did not appear to affect the result either way.