Storing Likes in a Non-Relational Database

Which approach to use, (1) or (2) depends on your use case, specifically, you should think about what data you will need to access more: to retrieve all products liked by a particular user (2) or to retrieve all users who liked a particular product (1). It looks more likely that (1) is a more frequent case - that way you would easily know if the user already liked the product as well as number of likes for the product as it is simply array length.

I would argue that any further improvement would likely be a premature optimization - it's better to optimize with a problem in hand.

If showing number of likes, for example, appears to be a bottleneck, you can denormalize your data further by storing array length as a separate key-value. That way displaying the product list wouldn't require receiving array of likes with userIds from the database.

Even more unlikely, with millions of likes of a single product, you'll find significant slowdown from looping through the likes array to check if the userId is already in it. You can, of course, use something like a sorted array to keep likes sorted, but database communication would be still slow (slower than looping through array in memory anyway). It's better to use the database indexing for binary search and instead of storing array of likes as array embedded into the product (or user) you can store likes in a separate collection:

{
    _id: $oid1,
    productId: $oid2,
    userId: $oid3
}

That, assuming, that the product has key with a number of likes, should be fastest way of accessing likes if all 3 keys are indexed.

You can also be creative and use concatenation of $oid2+$oid3 as $oid1 which would automatically enforce uniqueness of the user-product pair likes. So you'd just try saving it and ignore database error (might lead to subtle bugs, so it'd be safer to check like exists on a failure to save).


Why simply not amend requirements and use either relational database or RDBMS alike solution. Basically, use the right tool, for the right job:

Create another table Likes that keeps pair of your productId and userId as unique key. For example:

userId1 - productId2
userId2 - productId3
userId2 - productId2
userId1 - productId5
userId3 - productId2

Then you can query by userId and get number of likes per user or query by productId and get number of likes per product.

Moreover, unique key userId_productId will guarantee that one user can only like one product.

Additionally, you can keep in another column(s) extra information like timestamp when user liked the product etc.