Write timeout thrown by cassandra datastax driver

We experienced similar problems on a single node in an ESX cluster with SAN storage attached (which is not recommended by datastax, but we have no other options at this moment).

Note: the settings below can be a big blow to the maximum performance Cassandra can achieve, but we chose a stable system over high performance.

While running iostat -xmt 1 we found high w_await times at the same time the WriteTimeoutExceptions occured. It turned out the memtable could not be written to disk within the default write_request_timeout_in_ms: 2000 setting.

We significantly reduced the memtable size from 512Mb (defaults to 25% of heap space, which was 2Gb in our case) to 32Mb:

# Total permitted memory to use for memtables. Cassandra will stop
# accepting writes when the limit is exceeded until a flush completes,
# and will trigger a flush based on memtable_cleanup_threshold
# If omitted, Cassandra will set both to 1/4 the size of the heap.
# memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 32

We also slightly increated the write timeout to 3 seconds:

write_request_timeout_in_ms: 3000

Also make sure you write regularly to disk if you have high IO wait times:

#commitlog_sync: batch
#commitlog_sync_batch_window_in_ms: 2
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

These settings allowed the memtable to remain small and be written often. The exceptions were solved and we survived the stress tests that were run on the sytem.


Its worth double checking your GC settings for Cassandra.

In my case I was using a semaphore to throttle async writes and still (sometimes) getting timeouts.

It transpired that I was using unsuitable GC settings, I'd been using cassandra-unit for convenience which had the unintended consequence of running with the default VM settings. Consequently we would eventually trigger hit a stop-the-world GC resulting in a write timeout. Applying the same GC settings as my running cassandra docker image and all is fine.

This might be an uncommon cause but it would have helped me so it seems worth recording here.


It is coordinator (so the server) timing out waiting for acknowledgements for the write.


While I don't understand the root cause of this issue, I was able to solve the problem by increasing the timeout value in the conf/cassandra.yaml file.

write_request_timeout_in_ms: 20000