Mongo Collection `Size` is *larger* than `storageSize`?

storageSize is the sum of all extents for that data, excluding indexes.

So that collection takes up 2 extents, they are ~2GB each, hence ~4GB. size includes indexes and I believe a couple of other things which inflate the number. Neither really represents the proper on-disk size. For disk size, db.stats() has a filesize field which is closer to what you want I think you're looking for.

The manual is somewhat better at outlining what the various fields mean, see here for collections:

http://docs.mongodb.org/manual/reference/collection-statistics/

And here for database stats:

http://docs.mongodb.org/manual/reference/database-statistics/


Some other potentially relevant information:

The compact command does not shrink any datafiles; it only defragments deleted space so that larger objects might reuse it. The compact command will never delete or shrink database files, and in general requires extra space to do its work, usually a minimum of one extra extent.

If you repair the database it will essentially rewrite the data files from scratch, which will remove padding and store them on disk as efficiently as you are going to get. However you will need to have ~2x the size on disk to do so (actually less, but it's a decent guide).

One other thing to bear in mind here - repair and compact remove padding. The padding factor varies between 1 (no moves of documents caused by documents growing), to 2 (lots of moves caused by documents growing). Your padding factor of ~1.67 would indicate you are growing (and hence causing moves) quite a bit.

When you compact or repair a database you remove that padding - subsequent document growth is therefore going to trigger even more moves than before. Because moves are relatiely expensive operations, this can have a serious impact on your performance. More info here:

http://www.mongodb.org/display/DOCS/Padding+Factor


For mongodb > 3.x

For MMAPv1: 
datasize < storageSize

but For wiredTiger
datasize > storageSize (most cases due to compression but may be
                        storageSize greater, it varies on condition like
                        compression technique, whether compact/repair 
                        command run or not)

For db.getCollection('name').stats()

size = total size in memory of all records in a collection + padding (excluded index size + record header which is 16 byte per header, header means  = field name)        
avgObjSize = avg size of obj + padding
storageSize =  total amount of storage allocated to this collection for document storage. (totalIndex size excluded)
totalIndexSize : totalIndexSize (compressed in case of wiredTiger)

For db.stats()

dataSize = document + padding
storageSize = document + padding + deleted space
fileSize = document + padding extents +  index extents + yet-unused space

We can delete unused space or hole by this

db.getCollection('name').runCommand( "compact" )

After running compact or repair command we can get exact storage size and data size difference.

Compression Technique in mongodb wiredTiger:

- snappy : good compression, low overhead
- zlib: better compression, more CPU
- none (we can disable compression, by default its enable in WT)

Tags:

Mongodb