Text File Parsing in Java

I'm not sure how efficient it is memory-wise, but my first approach would be using a Scanner as it is incredibly easy to use:

File file = new File("/path/to/my/file.txt");
Scanner input = new Scanner(file);

while(input.hasNext()) {
    String nextToken = input.next();
    //or to process line by line
    String nextLine = input.nextLine();
}

input.close();

Check the API for how to alter the delimiter it uses to split tokens.


It sounds like you're doing something wrong to me - a whole lotta object creation going on.

How representative is that "test" file? What are you really doing with that data? If that's typical of what you really have, I'd say there's lots of repetition in that data.

If it's all going to be in Strings anyway, start with a BufferedReader to read each line. Pre-allocate that List to a size that's close to what you need so you don't waste resources adding to it each time. Split each of those lines at the comma; be sure to strip off the double quotes.

You might want to ask yourself: "Why do I need this whole file in memory all at once?" Can you read a little, process a little, and never have the whole thing in memory at once? Only you know your problem well enough to answer.

Maybe you can fire up jvisualvm if you have JDK 6 and see what's going on with memory. That would be a great clue.

Tags:

Java

File

Parsing