Where's the memory leak in this function?

As many have mentioned, this is probably just an artifact of the GC not cleaning up the memory storage as fast as you are expecting it to. This is normal for managed languages, like C#, Java, etc. You really need to find out if the memory allocated to your program is free or not if you're are interested in that usage. The questions to ask related to this are:

  1. How long is your program running? Is it a service type program that runs continuously?
  2. Over the span of execution does it continue to allocate memory from the OS or does it reach a steady-state? (Have you run it long enough to find out?)

Your code does not look like it will have a "memory-leak". In managed languages you really don't get memory leaks like you would in C/C++ (unless you are using unsafe or external libraries that are C/C++). What happens though is that you do need to watch out for references that stay around or are hidden (like a Collection class that has been told to remove an item but does not set the element of the internal array to null). Generally, objects with references on the stack (locals and parameters) cannot 'leak' unless you store the reference of the object(s) into an object/class variables.

Some comments on your code:

  1. You can reduce the allocation/deallocation of memory by pre-allocating the StringBuilder to at least the proper size. Since you know you will need to hold the entire file in memory, allocate it to the file size (this will actually give you a buffer that is just a little bigger than required since you are not storing new-line character sequences but the file probably has them):

    FileInfo fi = new FileInfo(path);
    StringBuilder fb = new StringBuilder((int) fi.Length);
    

    You may want to ensure the file exists before getting its length, using fi to check for that. Note that I just down-cast the length to an int without error checking as your files are less than 2GB based on your question text. If that is not the case then you should verify the length before casting it, perhaps throwing an exception if the file is too big.

  2. I would recommend removing all the variable = null statements in your code. These are not necessary since these are stack allocated variables. As well, in this context, it will not help the GC since the method will not live for a long time. So, by having them you create additional clutter in the code and it is more difficult to understand.

  3. In your ParseMessages method, you catch a NullReferenceException and assume that is just a non-text node. This could lead to confusing problems in the future. Since this is something you expect to normally happen as a result of something that may exist in the data you should check for the condition in the code, such as:

    if (node.Text != null)
        sb.Append(node.Text.Trim()); //Name
    

    Exceptions are for exceptional/unexpected conditions in the code. Assigning significant meaning to NullReferenceException more than that there was a null reference can (likely will) hide errors in other parts of that same try block now or with future changes.


There is no memory leak. If you are using Windows Task Manager to measure the memory used by your .NET application you are not getting a clear picture of what is going on, because the GC manages memory in a complex way that Task Manager doesn't reflect.

A MS engineer wrote a great article about why .NET applications that seem to be leaking memory probably aren't, and it has links to very in depth explanations of how the GC actually works. Every .NET programmer should read them.


I would look carefully at why you need to pass a string to parseMessages, ie fb.ToString().

Your code comment says that this returns an array of each lines content. However you are actually reading all lines from the log file into fb and then converting to a string.

If you are parsing large files in parseMessages() you could do this much more efficiently by passing the StringBuilder itself or the StreamReader into parseMessages(). This would enable only loading a portion of the file into memory at any time, as opposed to using ToString() which currently forces the entire logfile into memory.

You are less likely to have a true memory leak in a .NET application thanks to garbage collection. You do not look to be using any large resources such as files, so it seems even less likely that you have an actual memory leak.

It looks like you have disposed of resources ok, however the GC is probably struggling to allocate and then deallocate the large memory chunks in time before the next iteration starts, and so you see the increasing memory usage.

While GC.Collect() may allow you to force memory deallocation, I would strongly advise looking into the suggestions above before resorting to trying to manually manage memory via GC.

[Update] Seeing your parseMessages() and the use of HtmlAgilityPack (a very useful library, by the way) it looks likely there are some large and possibly numerous allocations of memory being performed for every logile.

HtmlAgility allocates memory for various nodes internally, when combined with your buffer array and the allocations in the main function I'm even more confident that the GC is being put under a lot of pressure to keep up.

To stop guessing and get some real metrics, I would run ProcessExplorer and add the columns to show the GC Gen 0,1,2 collections columns. Then run your application and observe the number of collections. If you're seeing large numbers in these columns then the GC is struggling and you should redesign to use less memory allocations.

Alternatively, the free CLR Profiler 2.0 from Microsoft provides nice visual representation of .NET memory allocations within your application.