Track Memory Usage in C++ and evaluate memory consumption

Finally I was able to solve the problem and will happily share my findings. In general the best tool to evaluate memory consumption of a program from my perspective is the Massif tool from Valgrind. it allows you to profile the heap consumption and gives you a detailed analysis.

To profile the heap of your application run valgrind --tool=massif prog now, this will give you basic access to all information about the typical memory allocation functions like malloc and friends. However, to dig deeper I activated the option --pages-as-heap=yes which will then report even the information about the underlaying system calls. To given an example here is something from my profiling session:

 67  1,284,382,720      978,575,360      978,575,360             0            0
100.00% (978,575,360B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->87.28% (854,118,400B) 0x8282419: mmap (syscall-template.S:82)
| ->84.80% (829,849,600B) 0x821DF7D: _int_malloc (malloc.c:3226)
| | ->84.36% (825,507,840B) 0x821E49F: _int_memalign (malloc.c:5492)
| | | ->84.36% (825,507,840B) 0x8220591: memalign (malloc.c:3880)
| | |   ->84.36% (825,507,840B) 0x82217A7: posix_memalign (malloc.c:6315)
| | |     ->83.37% (815,792,128B) 0x4C74F9B: std::_Rb_tree_node<std::pair<std::string const, unsigned int> >* std::_Rb_tree<std::string, std::pair<std::string const, unsigned int>, std::_Select1st<std::pair<std::string const, unsigned int> >, std::less<std::string>, StrategizedAllocator<std::pair<std::string const, unsigned int>, MemalignStrategy<4096> > >::_M_create_node<std::pair<std::string, unsigned int> >(std::pair<std::string, unsigned int>&&) (MemalignStrategy.h:13)
| | |     | ->83.37% (815,792,128B) 0x4C7529F: OrderIndifferentDictionary<std::string, MemalignStrategy<4096>, StrategizedAllocator>::addValue(std::string) (stl_tree.h:961)
| | |     |   ->83.37% (815,792,128B) 0x5458DC9: var_to_string(char***, unsigned long, unsigned long, AbstractTable*) (AbstractTable.h:341)
| | |     |     ->83.37% (815,792,128B) 0x545A466: MySQLInput::load(std::shared_ptr<AbstractTable>, std::vector<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*, std::allocator<std::vector<ColumnMetadata*, std::allocator<ColumnMetadata*> >*> > const*, Loader::params const&) (MySQLLoader.cpp:161)
| | |     |       ->83.37% (815,792,128B) 0x54628F2: Loader::load(Loader::params const&) (Loader.cpp:133)
| | |     |         ->83.37% (815,792,128B) 0x4F6B487: MySQLTableLoad::executePlanOperation() (MySQLTableLoad.cpp:60)
| | |     |           ->83.37% (815,792,128B) 0x4F8F8F1: _PlanOperation::execute_throws() (PlanOperation.cpp:221)
| | |     |             ->83.37% (815,792,128B) 0x4F92B08: _PlanOperation::execute() (PlanOperation.cpp:262)
| | |     |               ->83.37% (815,792,128B) 0x4F92F00: _PlanOperation::operator()() (PlanOperation.cpp:204)
| | |     |                 ->83.37% (815,792,128B) 0x656F9B0: TaskQueue::executeTask() (TaskQueue.cpp:88)
| | |     |                   ->83.37% (815,792,128B) 0x7A70AD6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
| | |     |                     ->83.37% (815,792,128B) 0x6BAEEFA: start_thread (pthread_create.c:304)
| | |     |                       ->83.37% (815,792,128B) 0x8285F4B: clone (clone.S:112)
| | |     |                         
| | |     ->00.99% (9,715,712B) in 1+ places, all below ms_print's threshold (01.00%)
| | |     
| | ->00.44% (4,341,760B) in 1+ places, all below ms_print's threshold (01.00%)

As you can see ~85% of my memory allocation come from a single branch and the question is now why the memory consumption is so high, if the original heap profiling showed a normal consumption. If you look at the example you will see why. For allocation I used posix_memalign to make sure allocations happen to useful boundaries. This allocator was then passed down from the outer class to the inner member variables (a map in this case) to use the allocator for heap allocation. However, the boundary I choose was too large - 4096 - in my case. This means, you will allocate 4b using posix_memalign but the system will allocate a full page for you to align it correctly. If you now allocate many small values you will end up with lots of unused memory. This memory will not be reported by normal heap profiling tools since you allocate only a fraction of this memory, but the system allocation routines will allocate more and hide the rest.

To solve this problem, I switched to a smaller boundary and thus could drastically reduce the memory overhead.

As a conclusion of my hours spent in front of Massif & Co. I can only recommend to use this tool for deep profiling since it gives you a very good understanding of what is happening and allows tracking errors easily. For the use of posix_memalign the situation is different. There are cases where it is really necessary, however, for most cases you will just fine with a normal malloc.


According to this article ps/top report how much memory your program uses if it were the only program running. Assuming that your program e.g. uses a bunch of shared libraries such as STL which are already loaded into memory there is a gap between the amount of actual memory that is allocated due to the execution of your program vs how much memory it would allocate if it were the only process.