Improve the performance for enumerating files and folders using .NET

A possibly faster alternative is to use WINAPI FindNextFile. There is an excellent Faster Directory Enumeration Tool for this. Which can be used as follows:

HashSet<FileData> GetPast60(string dir)
{
    DateTime retval = DateTime.Now.AddDays(-60);
    HashSet<FileData> oldFiles = new HashSet<FileData>();

    FileData [] files = FastDirectoryEnumerator.GetFiles(dir);
    for (int i=0; i<files.Length; i++)
    {
        if (files[i].LastWriteTime < retval)
        {
            oldFiles.Add(files[i]);
        }
    }    
    return oldFiles;
}

EDIT

So, based on comments below, I decided to do a benchmark of suggested solutions here as well as others I could think of. It was interesting enough to see that EnumerateFiles seemed to out-perform FindNextFile in C#, while EnumerateFiles with AsParallel was by far the fastest followed surprisingly by command prompt count. However do note that AsParallel wasn't getting the complete file count or was missing some files counted by the others so you could say the command prompt method is the best.

Applicable Config:

  • Windows 7 Service Pack 1 x64
  • Intel(R) Core(TM) i5-3210M CPU @2.50GHz 2.50GHz
  • RAM: 6GB
  • Platform Target: x64
  • No Optimization (NB: Compiling with optimization will produce drastically poor performance)
  • Allow UnSafe Code
  • Start Without Debugging

Below are three screenshots:

Run 1

Run 2

Run 3

I have included my test code below:

static void Main(string[] args)
{
    Console.Title = "File Enumeration Performance Comparison";
    Stopwatch watch = new Stopwatch();
    watch.Start();

    var allfiles = GetPast60("C:\\Users\\UserName\\Documents");
    watch.Stop();
    Console.WriteLine("Total time to enumerate using WINAPI =" + watch.ElapsedMilliseconds + "ms.");
    Console.WriteLine("File Count: " + allfiles);

    Stopwatch watch1 = new Stopwatch();
    watch1.Start();

    var allfiles1 = GetPast60Enum("C:\\Users\\UserName\\Documents\\");
    watch1.Stop();
    Console.WriteLine("Total time to enumerate using EnumerateFiles =" + watch1.ElapsedMilliseconds + "ms.");
    Console.WriteLine("File Count: " + allfiles1);

    Stopwatch watch2 = new Stopwatch();
    watch2.Start();

    var allfiles2 = Get1("C:\\Users\\UserName\\Documents\\");
    watch2.Stop();
    Console.WriteLine("Total time to enumerate using Get1 =" + watch2.ElapsedMilliseconds + "ms.");
    Console.WriteLine("File Count: " + allfiles2);


    Stopwatch watch3 = new Stopwatch();
    watch3.Start();

    var allfiles3 = Get2("C:\\Users\\UserName\\Documents\\");
    watch3.Stop();
    Console.WriteLine("Total time to enumerate using Get2 =" + watch3.ElapsedMilliseconds + "ms.");
    Console.WriteLine("File Count: " + allfiles3);

    Stopwatch watch4 = new Stopwatch();
    watch4.Start();

    var allfiles4 = RunCommand(@"dir /a: /b /s C:\Users\UserName\Documents");
    watch4.Stop();
    Console.WriteLine("Total time to enumerate using Command Prompt =" + watch4.ElapsedMilliseconds + "ms.");
    Console.WriteLine("File Count: " + allfiles4);


    Console.WriteLine("Press Any Key to Continue...");
    Console.ReadLine();
}

private static int RunCommand(string command)
{
    var process = new Process()
    {
        StartInfo = new ProcessStartInfo("cmd")
        {
            UseShellExecute = false,
            RedirectStandardInput = true,
            RedirectStandardOutput = true,
            CreateNoWindow = true,
            Arguments = String.Format("/c \"{0}\"", command),
        }
    };
    int count = 0;
    process.OutputDataReceived += delegate { count++; };
    process.Start();
    process.BeginOutputReadLine();

    process.WaitForExit();
    return count;
}

static int GetPast60Enum(string dir)
{
    return new DirectoryInfo(dir).EnumerateFiles("*.*", SearchOption.AllDirectories).Count();
}

private static int Get2(string myBaseDirectory)
{
    DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
    return dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
               .AsParallel().Count();
}

private static int Get1(string myBaseDirectory)
{
    DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
    return dirInfo.EnumerateDirectories()
               .AsParallel()
               .SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories))
               .Count() + dirInfo.EnumerateFiles("*.*", SearchOption.TopDirectoryOnly).Count();
}


private static int GetPast60(string dir)
{
    return FastDirectoryEnumerator.GetFiles(dir, "*.*", SearchOption.AllDirectories).Length;
}

NB: I concentrated on count in the benchmark not modified date.


This is (probably) as good as it's going to get:

DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles = 
    dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories)
           .AsParallel()
           .Where(fi => fi.CreationTime < sixtyLess).ToArray();

Changes:

  • Made the the 60 days less DateTime constant, and therefore less CPU load.
  • Used EnumerateFiles.
  • Made the query parallel.

Should run in a smaller amount of time (not sure how much smaller).

Here is another solution which might be faster or slower than the first, it depends on the data:

DateTime sixtyLess = DateTime.Now.AddDays(-60);
DirectoryInfo dirInfo = new DirectoryInfo(myBaseDirectory);
FileInfo[] oldFiles = 
     dirInfo.EnumerateDirectories()
            .AsParallel()
            .SelectMany(di => di.EnumerateFiles("*.*", SearchOption.AllDirectories)
                                .Where(fi => fi.CreationTime < sixtyLess))
            .ToArray();

Here it moves the parallelism to the main folder enumeration. Most of the changes from above apply too.


I realize this is very late to the party but if someone else is looking for this then you can speed things up by orders of magnitude by directly parsing the the MFT or FAT of the file system, this requires admin privileges as I think it will return all files regardless of security but can probably take your 30 mins down to 30 seconds for the enumeration stage at least.

A library for NTFS is here https://github.com/LordMike/NtfsLib there is also https://discutils.codeplex.com/ which I haven't personally used.

I would only use these methods for initial discovery of files over x days old and then verify them individual before deleting, it might be overkill but I'm cautious like that.