MongoDB using too much memory

Okay, so after following the clues given by loicmathieu and jstell, and digging it up a little, these are the things I found out about MongoDB using WiredTiger storage engine. I'm putting it here if anyone encountered the same questions.

The memory usage threads that I mentioned, all belonged to 2012-2014, all pre-date WiredTiger and are describing behavior of the original MMAPV1 storage engine which doesn't have a separate cache or support for compression.

The WiredTiger cache settings only controls the size of memory directly used by the WiredTiger storage engine (not the total memory used by mongod). Many other things are potentially taking memory in a MongoDB/WiredTiger configuration, such as the following:

  • WiredTiger compresses disk storage, but the data in memory are uncompressed.

  • WiredTiger by default does not fsync the data on each commit, so the log files are also in RAM which takes its toll on memory. It's also mentioned that in order to use I/O efficiently, WiredTiger chunks I/O requests (cache misses) together, that also seems to take some RAM (In fact dirty pages (pages that has changed/updated) have a list of updates on them stored in a Concurrent SkipList).

  • WiredTiger keeps multiple versions of records in its cache (Multi Version Concurrency Control, read operations access the last committed version before their operation).

  • WiredTiger Keeps checksums of the data in cache.

  • MongoDB itself consumes memory to handle open connections, aggregations, serverside code and etc.

Considering these facts, relying on show dbs; was not technically correct, since it only shows the compressed size of the datasets.

The following commands can be used in order to get the full dataset size.

db.getSiblingDB('data_server').stats()
# OR
db.stats()

This results is the following:

{
    "db" : "data_server",
    "collections" : 11,
    "objects" : 266565289,
    "avgObjSize" : 224.8413545621088,
    "dataSize" : 59934900658, # 60GBs
    "storageSize" : 22959984640,
    "numExtents" : 0,
    "indexes" : 41,
    "indexSize" : 7757348864, # 7.7GBs
    "ok" : 1
}

So it seems that the actual dataset size + its indexes are taking about 68GBs of that memory.

Considering all these, I guess the memory usage is now pretty expected, good part being it's completely okay to limit the WiredTiger cache size, since it handles I/O operations pretty efficiently (as described above).

There also remains the problem of OOM, to overcome this issue, since we didn't have enough resources to take out mongodb, we lowered the oom_score_adj to prevent OOM from killing important processes for the time being (Meaning we told OOM not to kill our desired processes).


I don't think you have a problem here with MongoDB, as jstell told you MongoDB with WiredTiger will use 50% of available memory so if you increase the RAM of your server it will takes more memory.

As why it's more than the size of DB + indexes, keep in mind that WiredTiger compress the database on disk and also use snapshot logs to record document changes. So the real size of the WiredTiger is the size using show dbs * compression_ration + size of snapshot logs. So it's almost impossible to know the exact expected size.

Keep also in mind that tools like top, ps, htop didn't display the memory really used by the application, refere to this SOW question for details : https://stackoverflow.com/questions/131303/how-to-measure-actual-memory-usage-of-an-application-or-process

Now, back to your issue. You have other tools running on the same host and a OOM kills them. I'm not familiar with Linux OOM but are you sure that it kills those because of MongoDB or .. just because of them (maybe it kill Postgres because Postgres took too much memory).

Anyway, as a best practice if you have a big Mongo database, don't install it in an host shared with other databases or you will have a lot of difficulties, in case there is a problem like the one you describe here, to know who really cause the issue on the host.


Docs

You may like to read basic memory concerns for MongoDB and also this brief discussion about checking memory usage.

Memory usage overview

The command db.serverStatus() (docs) can provide an overview of memory usage, specifically:

> db.serverStatus().mem
{ "bits" : 64, "resident" : 27, "virtual" : 397, "supported" : true }

> db.serverStatus().tcmalloc
... not easy to read! ...

> db.serverStatus().tcmalloc.tcmalloc.formattedString
------------------------------------------------
MALLOC:        3416192 (    3.3 MiB) Bytes in use by application
MALLOC: +      4788224 (    4.6 MiB) Bytes in page heap freelist
MALLOC: +       366816 (    0.3 MiB) Bytes in central cache freelist
...
... a bunch of stats in an easier to read format ...

When we had a RAM problem, it was because one of our indexes was taking up too much RAM. So here I will show how we tracked it down.

How big are your indexes?

db.stats() can show the total size of all indexes, but we can also get detailed info for a single collection using db.myCollection.stats()

For example, this command will compare the sizes of the indexes for every collection:

> db.getCollectionNames().map(name => ({totalIndexSize: db.getCollection(name).stats().totalIndexSize, name: name})).sort((a, b) => a.totalIndexSize - b.totalIndexSize).forEach(printjson)
...
{ "totalIndexSize" : 696320, "name" : "smallCollection" }
{ "totalIndexSize" : 135536640, "name" : "bigCollection" }
{ "totalIndexSize" : 382681088, "name" : "hugeCollection" }
{ "totalIndexSize" : 511901696, "name" : "massiveCollection" }

Now we can look at the details for that massive collection, to see which of its indexes are the most costly:

> db.massiveCollection.stats().indexSizes
{
        "_id_" : 230862848,
        "groupId_1_userId_1" : 49971200,
        "createTime_1" : 180301824,
        "orderId_1" : 278528,
        "userId_1" : 50155520
}

This can give us a better idea of where savings might be possible.

(In this case, we had an index over createTime which was rather huge - one entry per document - and we decided we could live without it.)