How to collect NTFS file properties and insert into SQL Server table

SSIS is well equipped to handle CSV files and load them into SQL Server.

You can have a very simple package using the Flat File Source. enter image description here

The dialogue and setup is a familiar windows "wizard" like process, and most of it is automated... what you need to pay attention to is that it has correctly guessed your file for lengths and data types. You can either adjust the settings in the connection manager or you can later change data types with SSIS tasks. Note that if you have say 10,000 rows of integers and then start getting characters the flat file source may easily assign an integer data type to that column, then fail when it encounters the characters. Thus with large files that may not be well structured you have to pay more attention to these settings. The Suggest Types... button allows you to increase the number of inspected rows, but I have found that even this can still recommend the wrong data types.

enter image description here

SSIS is a huge tool and you can perform data clean-up tasks or even split data into different tables from the single CSV. If you have different tables use tasks like Multicast or Conditional Split. You may also find that Data Conversion and Derived Column can help you efficiently produce the data you need as it moves through your package.

I wouldn't do much more than clean, split, modify, and load the data into SQL Server with SSIS though. SQL Server is highly optimized to produce aggregates, sorts, etc., while SSIS is less capable for such tasks. Tasks like Aggregate are blocking transforms which essentially means it can stall your SSIS package and consume a lot of memory.

As an example the below SSIS dataflow performs the following tasks:

  1. Reads a CSV file
  2. Creates derived columns which are just trimmed versions of the originals
  3. Performs a look-up to see if the record already exists in the destination
  4. If the record was not found then it is inserted in the destination

enter image description here


Another option that side-steps the external call, the CSV file, SSIS, etc is to use SQLCLR. You would use either the DirectoryInfo.EnumerateFiles method (newer, can enumerate the list prior to the list being filled) or the DirectoryInfo.GetFiles method (older, cannot enumerate while the list is being created). The EnumerateFiles method is new as of .NET 4.0, hence it is only available if you are using SQL Server 2012 or newer.

Those methods return a collection of FileInfo objects that will get you most of the properties directly. In order to get the Owner, you need to do a little more work, similar to what you are doing in your PowerShell script. You would use the FileInfo.GetAccessControl method, then call the GetOwner method. GetOwner takes a parameter of type and the MSDN documentation does not have any examples, but according to this S.O. answer, Find out File Owner/Creator in C#, it should just be:

FileProperties _Obj = new FileProperties();
DirectoryInfo _Directory = new DirectoryInfo(@"G:\", SearchOption.AllDirectories);

foreach (FileInfo _File in _Directory.EnumerateFiles())
{
  _Obj.Size = _File.Length;
  _Obj.Owner = _File.GetAccessControl().GetOwner(typeof(System.Security.Principal.NTAccount)).ToString();
  _Obj.other properties = _File.other properties
  yield return _Obj;
}

The code above assumes you have a struct or class named FileProperties that will be used to pass back the rows in a streaming TVF.

Using this method, the values returned can (and should) be strongly-typed. Hence, you can populate your table as follows:

INSERT INTO dbo.FileProperties (Name, Length, Path, Owner, ...)
  SELECT Name, Length, Path, Owner, ...
  FROM   dbo.GetFileProperties();

And GetFileProperties can even be updated to accept an input parameter for the starting directory :-).