Is there a faster way than fread() to read big data?

Assuming you want your file fully read into R, using database or choosing subset of columns/rows won't be much helpful.

What can be helpful in such case is to:
- ensure that you are using recent version of data.table
- ensure that optimal number of threads is set
use setDTthreads(0L) to use all available threads, by default data.table uses 50% of available threads.
- check output of fread(..., verbose=TRUE), and possibly add it to your question here
- put your file on fast disk, or a RAM disk, and read from there

If your data has a lot of distinct character variables you might not be able get great speed because of the fact that populating R's internal global character cache is single threaded, thus parsing can go fast but creating character vector(s) will be bottleneck.


You can use select = columns to only load the relevant columns without saturating your memory. For example:

dt <- fread("./file.csv", select = c("column1", "column2", "column3"))

I used read.delim() to read a file that fread() could not load completely. So you could convert your data into .txt and use read.delim().

However, why don't you open a connection to the SQL server you're pulling your data from. You can open connections to SQL servers with library(odbc) and write your query like you normally would. You can optimize your memory usage that way.

Check out this short introduction to odbc.