Cassandra data model for simple messaging app

Yes it's always a struggle to adapt to the limitations of Cassandra when coming from a relational database background. Since we don't yet have the luxury of doing joins in Cassandra, you often want to cram as much as you can into a single table. In your case that would be the users_by_username table.

There are a few features of Cassandra that should allow you to do that.

Since you are new to Cassandra, you could probably use Cassandra 3.0, which is currently in beta release. In 3.0 there is a nice feature called materialized views. This would allow you to have users_by_username as a base table, and create the users_by_email as a materialized view. Then Cassandra will update the view automatically whenever you update the base table.

Another feature that will help you is user defined types (in C* 2.1 and later). Instead of creating separate tables for followers and messages, you can create the structure of those as UDT's, and then in the user table keep lists of those types.

So a simplified view of your schema could be like this (I'm not showing some of the fields like timestamps to keep this simple, but those are easy to add).

First create your UDT's:

CREATE TYPE user_follows (
    followed_username text,
    street text,
);

CREATE TYPE msg (
    from_user text,
    body text
);

Next we create your base table:

CREATE TABLE users_by_username (
    username text PRIMARY KEY,
    email text,
    password text,
    follows list<frozen<user_follows>>,
    followed_by list<frozen<user_follows>>,
    new_messages list<frozen<msg>>,
    old_messages list<frozen<msg>>
);

Now we create a materialized view partitioned by email:

CREATE MATERIALIZED VIEW users_by_email AS
    SELECT username, password, follows, new_messages, old_messages FROM users_by_username
    WHERE email IS NOT NULL AND password IS NOT NULL AND follows IS NOT NULL AND new_messages IS NOT NULL
    PRIMARY KEY (email, username);

Now let's take it for a spin and see what it can do. Let's create a user:

INSERT INTO users_by_username (username , email , password )
    VALUES ( 'someuser', '[email protected]', 'somepassword');

Let the user follow another user:

UPDATE users_by_username SET follows = [{followed_username: 'followme2', street: 'mystreet2'}] + follows
    WHERE username = 'someuser';

Let's send the user a message:

UPDATE users_by_username SET new_messages = [{from_user: 'auser', body: 'hi someuser!'}] + new_messages
    WHERE username = 'someuser';

Now let's see what's in the table:

SELECT * FROM users_by_username ;

 username | email             | followed_by | follows                                                 | new_messages                                 | old_messages | password
----------+-------------------+-------------+---------------------------------------------------------+----------------------------------------------+--------------+--------------
 someuser | [email protected] |        null | [{followed_username: 'followme2', street: 'mystreet2'}] | [{from_user: 'auser', body: 'hi someuser!'}] |         null | somepassword

Now let's check that our materialized view is working:

SELECT new_messages, old_messages FROM users_by_email WHERE email='[email protected]'; 

 new_messages                                 | old_messages
----------------------------------------------+--------------
 [{from_user: 'auser', body: 'hi someuser!'}] |         null

Now let's read the email and put it in the old messages:

BEGIN BATCH
    DELETE new_messages[0] FROM users_by_username WHERE username='someuser'
    UPDATE users_by_username SET old_messages = [{from_user: 'auser', body: 'hi someuser!'}] + old_messages where username = 'someuser'
APPLY BATCH;

 SELECT new_messages, old_messages FROM users_by_email WHERE email='[email protected]';

 new_messages | old_messages
--------------+----------------------------------------------
         null | [{from_user: 'auser', body: 'hi someuser!'}]

So hopefully that gives you some ideas you can use. Have a look at the documentation on collections (i.e. lists, maps, and sets), since those can really help you to keep more information in one table and are sort of like tables within a table.


For cassandra or noSQL data modelling beginners, there is a process involved in data modelling your application, like

1- Understand your data, design a concept diagram
2- List all your quires in detail
3- Map your queries using defined rules and patterns, best suitable for cassandra
4- Create a logical design, table with fields derived from queries
5- Now create a schema and test its acceptance.

if we model it well, then it is easy to handle issues such as new complex queries, data over loading, data consistency setc.

After taking this free online data modelling training, you will get more clarity

https://academy.datastax.com/courses/ds220-data-modeling

Good Luck!