database vs. flat files

They're faster; unless you're loading the entire flat file into memory, a database will allow faster access in almost all cases.

They're safer; databases are easier to safely backup; they have mechanisms to check for file corruption, which flat files do not. Once corruption in your flat file migrates to your backups, you're done, and you might not even know it yet.

They have more features; databases can allow many users to read/write at the same time.

They're much less complex to work with, once they're setup.


This is an answer I've already given some time ago:

It depends entirely on the domain-specific application needs. A lot of times direct text file/binary files access can be extremely fast, efficient, as well as providing you all the file access capabilities of your OS's file system.

Furthermore, your programming language most likely already has a built-in module (or is easy to make one) for specific parsing.

If what you need is many appends (INSERTS?) and sequential/few access little/no concurrency, files are the way to go.

On the other hand, when your requirements for concurrency, non-sequential reading/writing, atomicity, atomic permissions, your data is relational by the nature etc., you will be better off with a relational or OO database.

There is a lot that can be accomplished with SQLite3, which is extremely light (under 300kb), ACID compliant, written in C/C++, and highly ubiquitous (if it isn't already included in your programming language -for example Python-, there is surely one available). It can be useful even on db files as big as 140 terabytes, or 128 tebibytes (Link to Database Size), possible more.

If your requirements where bigger, there wouldn't even be a discussion, go for a full-blown RDBMS.

As you say in a comment that "the system" is merely a bunch of scripts, then you should take a look at pgbash.


  1. Databases can handle querying tasks, so you don't have to walk over files manually. Databases can handle very complicated queries.
  2. Databases can handle indexing tasks, so if tasks like get record with id = x can be VERY fast
  3. Databases can handle multiprocess/multithreaded access.
  4. Databases can handle access from network
  5. Databases can watch for data integrity
  6. Databases can update data easily (see 1) )
  7. Databases are reliable
  8. Databases can handle transactions and concurrent access
  9. Databases + ORMs let you manipulate data in very programmer friendly way.

Don't build it if you can buy it.

I heard this quote recently, and it really seems fitting as a guide line. Ask yourself this... How much time was spent working on the file handling portion of your app? I suspect a fair amount of time was spent optimizing this code for performance. If you had been using a relational database all along, you would have spent considerably less time handling this portion of your application. You would have had more time for the true "business" aspect of your app.

Tags:

Database

File