Key-value stores for medium to large values

Depending on

  • the number of files
  • how you structure them on the FS
  • which FS you're using
  • what kind of storage you're using

you may end up running out of inodes, or may have slow times accessing the files again (e.g. if you put too many entries in a single directory).

You also have to put a bit of care in accessing the files (and/or creating directories) atomically while a KV store will usually take care of that for you.

I had problems with all these things in the past with fs-as-key-value-store approaches :) .

But It can be done, see for example Bigdis which is an implementation of the redis KV protocol as files-on-disk, from the redis author himself, but you have to be a bit careful with your ops.

Depending on your problem you may find MogileFS or straight cloudy S3 to be better solutions.


Summary: For your requirements of Data Integrity, Persistence, Size & Speed I recommend Redis.

A nice intro presentation can be seen here:
https://simonwillison.net/static/2010/redis-tutorial/

n.b. More info would help but based on what you've given + what I know, here are some of the main players:

Memcached:
https://memcached.org/
A free, open source, high-performance, distributed memory object caching system, good for speeding up dynamic web applications.
+ good for web applications, free, open source.
- if the server goes down (memcached process failure or system reboot) all sessions are lost. Performance limitations at the higher (commercial usage) levels.

Redis:
https://redis.io/
Similar to memcached but with data persistence, supports multiple value types, counters with atomic increment/decrement and built-in key expiration.
+ saves data to disk so never lost, very simple, speed, flexibility (keys can contain strings, hashes, lists, sets and sorted sets), sharding, maintained by vmware rather than an individual.
- limited clustering.

LevelDB:
https://google-opensource.blogspot.com/2011/07/leveldb-fast-persistent-key-value-store.html
A fast key-value storage engine written at Google that maps string keys to string values.
+ Google.
- ?possible with Google + ;)

TokoyoCabinet:
https://fallabs.com/tokyocabinet/
Includes support for locking, ACID transactions, a binary array data type.
+ Speed and efficiency.
- Less known in some areas, e.g. US

Project Voldemort:
https://project-voldemort.com/
An advanced key-value store, written in Java. Provides multi-version concurrency control (MVCC) for updates. Updates to replicas are done asynchronously, so it does not guarantee consistent data.
+ Functionality
- Conistency

MongoDB:
https://www.mongodb.org/
A scalable, high-performance, open source, document-oriented database. Written in C++ Features Replication & High Availability with mirrors across LANs and WANs and Auto-Sharding. Popular in the Ruby on Rails community.
+ Easy installation, good documentation, support.
- Relatively new.

Couch:
http://www.couchdb.org/
Similar to Mongo, aimed at document databases.
+ replication, advanced queries.
- clustering, disk space management.

Cassandra:
https://cassandra.apache.org/
Apache Cassandra is fault-tolerant and decentralized and is used at Netflix, Twitter and Reddit, among others.
+ Cluster and replication.
- More setup knowledge needed.

I can't provide all the references, due to lack of time but hope this at least helps.