I am studing couchbase, can anyone exlain what exactly is bucket and vbucket?

You can start with Couchbase documentation, section "Architecture and Concepts" http://docs.couchbase.com/admin/admin/Concepts/concept-intro.html

For more information about buckets, see http://docs.couchbase.com/admin/admin/Concepts/concept-dataStorage.html.

For more information about vBuckets, see http://docs.couchbase.com/admin/admin/Concepts/concept-vBucket.html.

In short, bucket is an abstraction, which describes certain resources on the cluster (like RAM and disk space) and also from the API standpoint it is namespace for the documents stored in the system, similar to database in SQL world.


Bucket is like database at RDBMS. It contains documents, views and some configurations. VBucket is like shard at RDBMS. All keys at CB mapped to #VBucket and #VBucket mapped to server-name. Thanks to these hash functions results in an even distribution of documents on multiple nodes and fast get operation of the document by its id.


Short answer

Bucket is a logical keyspace of uniquely keyed documents, evenly distributed across all nodes in a cluster.

vBucket is a subset of a bucket which is located on a single node. Union of all vBuckets is a bucket.

Slightly longer answer

Imagine you have three nodes:

+----------+         +----------+        +----------+
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
+----------+         +----------+        +----------+
   node1                node2               node3

A bucket is a set of documents (that can be different in structure and attributes) that is distributed over all three nodes but it shares the same key space.

   +----------+         +----------+        +----------+
+---------------------------------------------------------------+
|  |          |         |          |        |          |        |
|  |          |         |          |        |          |      Bucket
|  |          |         |          |        |          |        |
+---------------------------------------------------------------+
   |          |         |          |        |          |
   |          |         |          |        |          |
   +----------+         +----------+        +----------+
      node1                node2               node3

Note that a key must be unique within a bucket, which is kind of different compared to a database concept in RDBMS where a key is unique within a table.

The bucket is divided into 1024 segments which are evenly distributed across all the nodes in the cluster. These segments are virtual buckets, or vBucketes. So, in this case, on each node there are 1024/3 vBuckets.

   +----------+         +----------+        +----------+
+---------------------------------------------------------------+
|  |          |         |          |        |          |        |
|  |  341 vBs |         |  341 vBs |        |  342 vBs |      Bucket
|  |          |         |          |        |          |        |
+---------------------------------------------------------------+
   |          |         |          |        |          |
   |          |         |          |        |          |
   +----------+         +----------+        +----------+
      node1                node2               node3

Each vBucket has its associated set of documents. So when the lookup is performed, clusterMap calculates the hash of the searched document's key and identifies the node and the vBucket where the document is located.

references: http://training.couchbase.com/online