Import selected columns from a tab delimited text file

You might try this:

Import["data.txt", {"Data", All, 5}]

If you want to load only the fifth column of the first two lines you can run this:

Import["data.txt", {"Data", {1, 2}, 5}]

You can find more about this here.

Update

Some testing on a relatively small data file (12MB) might prove that this is not really the solution you're after. It seems that it takes much more time to load just one column (~130 seconds) than to load the entire file with Import["data.txt","Table"] (~4 seconds).


On Linux you could use this simple solution.

Linux Shell

From the shell:

$cat myfile.txt | cut -f 5 > col5.txt

Then import into Mathematica as usual.

data = Import["col5.txt","Table"]

ReadList

This can be condensed to one ReadList command.

data = ReadList["!cat myfile.txt | cut -f 5"];

If you are on a Unix-like system or have installed cut or awk on other systems letting these do the extraction is most probably the most efficient way to solve your problem. If for some reason you are looking for a Mathematica-only way here is one:

readColumn[filename_, colnum_, numcols_, numlinesperchunk_Integer: 500] := 
  Module[{
    str = OpenRead[filename],
    res = {""}, data
  },
  data = Reap[
  While[Length[res] > 0, 
   Sow[res = 
     ReadList[str, Table[Real, {numcols}], numlinesperchunk][[All,colnum]]]
   ]
  ][[2]];
  Close[str];
  Flatten[data]
]

to extract the 5th out of 6 columns you would use:

readColumn[filename,5,6]

using the 4th argument you can play around with numlinesperchunk for further optimization. Note that there is a tradeoff between maximal speed (very large values for numlinesperchunk) and minimal memory usage (low values for numlinesperchunk). For largish files I would expect a good value might be some hundreds of lines per chunk, but your milage may vary...

This might look a bit complicated for such a simple tasks but it tries to be efficient not only concerning runtimes but especially concerning memory which usually is a larger problem when importing large data files into Mathematica. Note that the code here is very similar to that for my answer to this question, where you find other ways to do similar things even more efficient at the price of more involved programming in Java and also some more background information...

Tags:

Text

Import