What is a "database"?

I will quote Dictionary.com, as I take this as the meaning of database:

a comprehensive collection of related data organized for convenient access, generally in a computer.

Under this definition, you can consider a database anything from a full-fledged RDBMS (SQL Server, Oracle, etc.) to a basic flat file. If it stores data, it technically can be considered a database.

Now, like most things in our modern world, there's the accepted meaning of a name. And in the case of database, that will vary from person to person. A lot of people think of a database solely as an entity managed by a data system.

It is worth noting @FrustratedWithFormsDesigner's comment:

card-catalogues would also count if you removed the "...generally in a computer".

I agree with that statement, and I don't necessarily think that a database needs to live in a "computer" or any electronic device. A card-catalogue is a perfect example of a non-computerized database.

This is a great question and a set of great answers. I think one thing that is missing from the discussion is an answer which delves into the distinction between a database and a database management system (DBMS). I like the definition of database that Shark provided from dictionary.com. I think it really shows the need for the distinction between the database and the DBMS. The database is a "a comprehensive collection of related data organized for convenient access." The second part of that definition, which says "generally in a computer" is where the distinction lies. If it is stored in a computer, it may or may not be stored in a DBMS. It may be stored in an OS file system. It might be stored in a proprietary file system. Thus I agree with FrustratedWithFormsDesigner that a card catalog is a "database" (well maybe - is it comprehensive and related? More on that later). It just happens to be stored in a file cabinet. In today's world most "comprehensive collections of related data organized for convenient access are stored on a computer, so I disagree with Shark that it is a pity Dictionary.com added that part. I think it is absolutely correct - as a definition of "database".

So how do we define DBMS? I went back to dictionary.com and found this:

"A suite of programs which typically manage large structured sets of persistent data, offering ad hoc query facilities to many users. They are widely used in business applications. "

The definition continues on and is quite long. It describes common features provided by a DBMS, such as security, data integrity, transaction management, concurrency control, and most importantly - data independence. A DBMS provides an external view of the data abstracted from how it is physically stored.

Using this definition, I think it is clear that a DBMS must provide a data model, which is how the data is organized for presentation to the user. The three common models are hierarchical (IMS), network (IDMS), and relational (DB2, Oracle, SQL-Server, etc). There is also the OO model (OODBMS). Only the relational model today has broad applicability. THe other models are still in use but only in niche situations. The DBMS must also provide the other features mentioned. I would refer to these collectively as data management features or capabilities.

Therefore, software products which provide data management features are DBMS', whereas products that do not provide these are not DBMS'. NoSQL products are not DBMS'. That is not to say they are not useful, and not to say they don't store "databases". I like to think that DBMS', as the definition says, solve a class of problems related to business applications like accounting, payroll, billing, customer relationship management, sales, etc. NoSQL products, while not DBMS', are excellent for solving a class of problems that are unrelated to traditional business applications but now exist due to the huge amount of storage and bandwidth computing technology is capable of today. These are applications like internet search, like online auction, like twitter and like facebook. The DBMS is not a good fit to solve these problems as the DBMS contains data management features which, while an absolute necessity for a business application, are of no use for solving storage and retrieval of Craig's list ads or twitter feeds (well usually anyway - that is another discussion for another time :-)). Those problems require massive scale out and extremely fast response and the DBMS, with its feature bloat, isn't a good fit.

A data professional needs to understand all of these tools for storing data and what class of problems they are suited to solve in order to choose the right tool for the job, just like a general contractor has to know which of his or her construction tools is the right tool for the job. No tool is good or bad in and of its self. It is good if it is a good fit to solve an important problem.

I will conclude by noting two other key distinction in the definition of both database and DBMS that might be overlooked in the discussion thus far. The definition of database includes "comprehensive collection of related data." The definition of DBMS includes "manage large structured sets of persistent data". First, for data storage to rise to meet the definition of database, it must be "comprehensive" and "related". This is where the excel spreadsheet of sales, or the huge customer VSAM file or flat file, do not qualify as databases. These examples are single sets of data, not multiple sets of data that are related. None of them are comprehensive over an entire subject area. The sales spreadsheet just has sales. It doesn't relate to information about customers and products beyond perhaps the customer name and the product number. Now if that spreadsheet is a work book that contains a list of customers, a list of products, and then a list of sales that relate the customers to the products, we have a database. But if we were going to store it in a relational way we'd be better off using MS Access or some other relational DBMS. So perhaps a card catalog isn't a database after all as while comprehensive (it has a record of all the books in the library) it isn't related as it only has information about books, not complete related information about authors, publishers, etc.

Second, a DBMS excels at storing "structured" data. It is entirely based on a defined schema of discrete data elements with structured types. A NoSQL product, say a key value store which is devoid of a schema, excels at storing unstructured data. That NoSQL product therefore does not meet the definition of a DBMS. But if the problem you are trying to solve is the storage of unstructured data (something we didn't even attempt to do when DBMS' were first developed), and you don't need data management features independent of the application you will write to process that unstructured data, the NoSQL product is a perfect tool fit.

I hope this answer adds value to the other great answers posted here. I look forward to any comments and discussion points anyone else may have that will help us all broaden our understanding of databases and classes of technology that solve data related problems.

To me, a database is a thing that exists to store and retrieve data. We call Access a database, even though it's really just a pretty front end to a collection of files. Outlook (at least on the Mac) calls its message store a database. Some people even call Excel a database (but that kind of makes me snort - so there is a line somewhere).

I think the definition has evolved over time, and comparing dictionary.com, to wiki, to papers from various database professionals over the course of the last 30 years, will yield a variety of definitions. And the definition will continue to evolve, as well.

If you're talking about some kind of data source that you or your applications use to store or retrieve data, whether it is relational or not, I don't have a problem with you calling it a database. If it's a text file, you might get some raised eyebrows, but I'm not sure I understand the need to pinpoint the definition in such a finite way that people get angry about it.

Some people get pretty uppity, apparently, if you even come peripheral to suggesting that BigTable (or NoSQL or hadoop) is a "database," and claim that calling it as such will give - particularly to newbies - great promise of infinite performance, immortality and Unicorns. Whereas usually you just mean that it's a place where data is stored and retrieved, without any warranties about what the actual implementation does, whether it's relational or not, or whether you could produce such a thing yourself when bored on a Sunday afternoon.

I will admit that I cringe when people talk about a relational database and call rows "records" or columns "fields." But while it irks me a bit, I don't get angry or go out of my way to correct them - what is the point? I understood what they meant, even if they aren't 100% accurate.