Organizational Research By

Surprising Reserch Topic

total mongodb storage size

total mongodb storage size  using -'mongodb'

I have a sharded and replicated MongoDB with dozens millions of records. I know that Mongo writes data with some padding factor, to allow fast updates, and I also know that to replicate the database Mongo should store operation log which requires some (actually, a lot of) space. Even with that knowledge I have no idea how to estimate the actual size required by Mongo given a size of a typical database record. By now I have a descrepancy with a factor of 2 - 3 between weekly repairs.

So the question is: How to estimate a total storage size required by MongoDB given an average record size in bytes?

asked Sep 21, 2015 by okesh.badhiye
0 votes

Related Hot Questions

1 Answer

0 votes

The short answer is: you can't, not based solely on avg. document size (at least not in any accurate way).

To explain more verbosely:

The space needed on disk is not simply a function of the average document size. There is also the space needed for any indexes you create. Then there is the space needed if you do trigger those moves (despite padding, this does happen) - that space is placed on a list to be re-used but depending on the data you subsequently insert, it may or may not be possible to re-use that space.

You can also add into the fact that pre-allocation will mean that occasionally a handful of documents will increase your on-disk space utilization by ~2GB as a new data file is allocated. Of course, with sufficient data, this will be essentially a rounding error but it is worth bearing in mind.

The only way to estimate this type of data to size ratio, assuming a consistent usage pattern, is to trend it over time for your particular use case and track the disk space usage versus the data inserted (number of documents might be better than data volume depending on variability of doc size).

Similarly, if you track the insertion rate, doc size and the space gained back from a resync/repair. FYI - you can resync a secondary from scratch to get a "fresh" copy of the data files rather than running a repair, which can be less disruptive, and use less space depending on your set up.

answered Sep 21, 2015 by nimisha.jagtap