What is the difference between dataset and database?

Database

The definition of the two terms is not always clear. In general a database is a set of data organized and accessible using a database management system (DBMS). Databases usually, but not always, are composed of several tables linked together often accessed, modified and updated by various users often simultaneously.

Cambridge dictionary:

A structured set of data held in a computer, especially one that is accessible in various ways.

Merriam-webster

a usually large collection of data organized especially for rapid search and retrieval (as by a computer)

Data set (or dataset)

A data set sometimes refer to the contents of a single database table, but this is quite a restrictive definition. In general, as the name suggests, is a set (or collection) of data hence there are datasets of images like Caltech-256 Object Category Dataset or videos e.g. A large-scale benchmark dataset for event recognition in surveillance video. A data set purpose is usually designed for the analysis rather to a continual update form different users, hence represent the end of a collection of data or a snapshot of a specific time.

Oxford dictionary:

A collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer.

‘all hospitals must provide a standard data set of each patient's details’

Cambridge dictionary

a collection of separate sets of information that is treated as a single unit by a computer


In American English, database usually means "an organized collection of data". A database is usually under the control of a database management system, which is software that, among other things, manages multi-user access to the database. (Usually, but not necessarily. Some simple databases are just text files processed with interpreted languages like awk and Python.)

In the SQL world, which is what I'm most familiar with, a database includes things like tables, views, stored procedures, triggers, permissions, and data.

Again, in American English, dataset usually refers to data selected and arranged in rows and columns for processing by statistical software. The data might have come from a database, but it might not.