Multiple small inserts in clickhouse

I've had a similar problem, although not as bad - making ~20 inserts per second caused the server to reach a high loadavg, memory consumption and CPU use. I created a Buffer table which buffers the inserts in memory, and then they are flushed periodically to the "real" on-disk table. And just like magic, everything went quite: loadavg, memory and CPU usage came down to normal levels. The nice thing is that you can run queries against the buffer table, and get back matching rows from both memory and disk - so clients are unaffected by the buffering. See https://clickhouse.tech/docs/en/engines/table-engines/special/buffer/


This is known issue when processing large number of small inserts into (non-replicated) MergeTree.

This is a bug, we need to investigate and fix.

For workaround, you should send inserts in larger batches, as recommended: about one batch per second: https://clickhouse.tech/docs/en/introduction/performance/#performance-when-inserting-data.


Clickhouse has special type of tables for this - Buffer. It's stored in memory and allow many small inserts with out problem. We have near 200 different inserts per second - it works fine.

Buffer table:

CREATE TABLE logs.log_buffer (rid String, created DateTime, some String, d Date MATERIALIZED toDate(created))
ENGINE = Buffer('logs', 'log_main', 16, 5, 30, 1000, 10000, 1000000, 10000000);

Main table:

CREATE TABLE logs.log_main (rid String, created DateTime, some String, d Date) 
ENGINE = MergeTree(d, sipHash128(rid), (created, sipHash128(rid)), 8192);

Details in manual: https://clickhouse.yandex/docs/en/operations/table_engines/buffer/


Alternatively, you can use something like https://github.com/nikepan/clickhouse-bulk: it will buffer multiple inserts and flush them all together according to user policy.

Tags:

Clickhouse