Is Cassandra a column oriented or columnar database

A good way of thinking about cassandra is as a map of maps, where the inner maps are sorted by key. A partition has many columns, and they are always stored together. They are sorted by clustering keys - first by the first key, then the next, then next...and so on. Partitions are then replicated amongst replicas. It's not necessarily stored as "rows" as different rows are stored on different nodes based on replication strategy and active hashing algorithm. In other words, a partition for ProductId 1 is likely not stored next to ProductId 2 if ProductId is the partition key. However the coloumns for Product Id 1, are always stored together.

As for definitions, most NoSQL stores are blurring the lines one way or the other. They usually span multiple categories. I'll leave it up to you to decide whether this qualifies as a columnar database or not :)


  • In a Column oriented or a columnar database data are stored on disk in a column wise manner.

    e.g: Table Bonuses table

       ID         Last    First   Bonus
       1          Doe     John    8000
       2          Smith   Jane    4000
       3          Beck    Sam     1000
    
  • In a row-oriented database management system, the data would be stored like this: 1,Doe,John,8000;2,Smith,Jane,4000;3,Beck,Sam,1000;

  • In a column-oriented database management system, the data would be stored like this:
    1,2,3;Doe,Smith,Beck;John,Jane,Sam;8000,4000,1000;

  • Cassandra is basically a column-family store

  • Cassandra would store the above data as:

    Bonuses: { row1: { "ID":1, "Last":"Doe", "First":"John", "Bonus":8000}, row2: { "ID":2, "Last":"Smith", "Jane":"John", "Bonus":4000} ... }

  • Vertica, VectorWise, MonetDB are some column oriented databases that I've heard of.

  • Read this for more details.

Hope this helps.


If you go to the Apache Cassandra project on GitHub, and scroll down to the "Executive Summary," you will get your answer:

Cassandra is a partitioned row store. Rows are organized into tables with a required primary key.

Partitioning means that Cassandra can distribute your data across multiple machines in an application-transparent matter. Cassandra will automatically repartition as machines are added and removed from the cluster.

Row store means that like relational databases, Cassandra organizes data by rows and columns.

"So I feel like Cassandra is a row wise data store"

And that would be correct.