Allocating a large memory block in C++

You forgot one dimension, and the overhead of allocating memory. The shown code allocates memory very inefficiently in the third dimension, resulting in way too much overhead.

float*** a = new float**[N];

This will allocate, roughly 22000 * sizeof(float **), which is rougly 176kb. Negligible.

a[m] = new float*[M - 1];

A single allocation here will be for 44099 * sizeof(float *), but you will grab 22000 of these. 22000 * 44099 * sizeof(float *), or roughly 7.7gb of additional memory. This is where you stopped counting, but your code isn't done yet. It's got a long ways to go.

a[m][n] = new float[2];

This is a single allocation of 8 bytes, but this allocation will be done 22000 * 44099 times. That's another 7.7gb flushed down the drain. You're now over 15 gigs of application-required memory, roughly, that needs to be allocated.

But each allocation does not come free, and new float[2] requires more than 8 bytes. Each individually allocated block must be tracked internally by your C++ library, so that it can be recycled by delete. The most simplistic link-list based implementation of heap allocation requires one forward pointer, one backward pointer, and the count of how many bytes are there in the allocated block. Assuming nothing needs to be padded for alignment purposes, this is at least 24 bytes of overhead per allocation, on a 64-bit platform.

Now, since your third dimension makes 22000 * 44099 allocations, 22000 allocations for the second dimension, and one allocation for the first dimension: if I count on my fingers, this will require (22000 * 44099 + 22000 + 1) * 24, or another 22 gigabytes of memory, just to consume the overhead of the most simple, basic memory allocation scheme.

We're now up to about 38 gigabytes of RAM needed using the most simple, possible, heap allocation tracking, if I did my math right. Your C++ implementation is likely to use a slightly more sophisticated heap allocation logic, with larger overhead.

Get rid of the new float[2]. Compute your matrix's size, and new a single 7.7gb chunk, then calculate where the rest of your pointers should be pointing to. Also, allocate a single chunk of memory for the second dimension of your matrix, and compute the pointers for the first dimension.

Your allocation code should execute exactly three new statements. One for the first dimension pointer, One for the second dimension pointers. And one more for the huge chunk of data that comprises your third dimension.


Just to round out one answer already given, the example below is basically an extension of the answer given here on how to create a contiguous 2D array, and illustrates the usage of only 3 calls to new[].

The advantage is that you keep the [][][] syntax you would normally use with triple pointers (although I highly advise against writing code using "3 stars" like this, but we have what we have). The disadvantage is that more memory is allocated for the pointers with the addition to the single memory pool for the data.

#include <iostream>
#include <exception>

template <typename T>
T*** create3DArray(unsigned pages, unsigned nrows, unsigned ncols, const T& val = T())
{
    T*** ptr = nullptr;  // allocate pointers to pages
    T** ptrMem = nullptr;
    T* pool = nullptr;
    try 
    {
        ptr = new T**[pages];  // allocate pointers to pages
        ptrMem = new T*[pages * nrows]; // allocate pointers to pool
        pool = new T[nrows*ncols*pages]{ val };  // allocate pool

        // Assign page pointers to point to the pages memory,
        // and pool pointers to point to each row the data pool
        for (unsigned i = 0; i < pages; ++i, ptrMem += nrows)
        {
            ptr[i] = ptrMem;
            for (unsigned j = 0; j < nrows; ++j, pool += ncols)
                ptr[i][j] = pool;
        }
        return ptr;
     }
     catch(std::bad_alloc& ex)
     {
         // rollback the previous allocations
        delete [] ptrMem;
        delete [] ptr;
        throw ex; 
    }
}

template <typename T>
void delete3DArray(T*** arr)
{
    delete[] arr[0][0]; // remove pool
    delete[] arr[0];  // remove the pointers
    delete[] arr;     // remove the pages
}

int main()
{
    double ***dPtr = nullptr;
    try 
    {
        dPtr = create3DArray<double>(4100, 5000, 2);
    }
    catch(std::bad_alloc& )
    {
        std::cout << "Could not allocate memory";
        return -1;
    }
    dPtr[0][0][0] = 10;  // for example
    std::cout << dPtr[0][0][0] << "\n";
    delete3DArray(dPtr);  // free the memory
}

Live Example


That was probably a simplified version of your problem, but the data structure you’re using (“three-star" arrays) is almost never the one you want. If you’re creating a dense matrix like here, and allocating space for every element, there is no advantage at all to making millions of tiny allocations. If you want a sparse matrix, you normally want a format like compressed sparse row.

If the array is “rectangular” (or I suppose a 3-D one would be “boxy”), and all the rows and columns are the same size, this data structure is purely wasteful compared to allocating a single memory block. You perform millions of tiny allocations, allocate space for millions of pointers, and lose locality of memory.

This boilerplate creates a zero-cost abstraction for a dynamic 3-D array. (Okay, almost: it’s redundant to store both the length of the underlying one-dimensional std::vector and the individual dimensions.) The API uses a(i, j, k) as the equivalent of a[i][j][k] and a.at(i,j,k) as the variant with bounds-checking.

This API also has an option to fill the array with a function of the indices, f(i,j,k). If you call a.generate(f), it sets each a(i,j,k) = f(i,j,k). In theory, this strength-reduces the offset calculation within the inner loop to make it much faster. The API can also pass the generating function to the constructor as array3d<float>(M, N, P, f). Extend it as you please.

#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <functional>
#include <iomanip>
#include <iostream>
#include <vector>

using std::cout;
using std::endl;
using std::ptrdiff_t;
using std::size_t;

/* In a real-world implementation, this class would be split into a
 * header file and a definitions file.
 */
template <typename T>
  class array3d {
    public:
    using value_type = T;
    using size_type = size_t;
    using difference_type = ptrdiff_t;
    using reference = T&;
    using const_reference = const T&;
    using pointer = T*;
    using const_pointer = const T*;
    using iterator = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;
    using reverse_iterator = typename std::vector<T>::reverse_iterator;
    using const_reverse_iterator = typename
      std::vector<T>::const_reverse_iterator;

/* For this trivial example, I don’t define a default constructor or an API
 * to resize a 3D array.
 */
    array3d( const ptrdiff_t rows,
             const ptrdiff_t cols,
             const ptrdiff_t layers )
    {
      const ptrdiff_t nelements = rows*cols*layers;

      assert(rows > 0);
      assert(cols > 0);
      assert(layers > 0);
      assert(nelements > 0);

      nrows = rows;
      ncols = cols;
      nlayers = layers;
      storage.resize(static_cast<size_t>(nelements));
    }

/* Variant that initializes an array with bounds and then fills each element
 * (i,j,k) with a provided function f(i,j,k).
 */
    array3d( const ptrdiff_t rows,
             const ptrdiff_t cols,
             const ptrdiff_t layers,
             const std::function<T(ptrdiff_t, ptrdiff_t, ptrdiff_t)> f )
    {
      const ptrdiff_t nelements = rows*cols*layers;

      assert(rows > 0);
      assert(cols > 0);
      assert(layers > 0);
      assert(nelements > 0);

      nrows = rows;
      ncols = cols;
      nlayers = layers;
      storage.reserve(static_cast<size_t>(nelements));

      for ( ptrdiff_t i = 0; i < nrows; ++i )
        for ( ptrdiff_t j = 0; j < ncols; ++j )
          for ( ptrdiff_t k = 0; k < nlayers; ++k )
            storage.emplace_back(f(i,j,k));

      assert( storage.size() == static_cast<size_t>(nelements) );
    }

    // Rule of 5:
    array3d( const array3d& ) = default;
    array3d& operator= ( const array3d& ) = default;
    array3d( array3d&& ) = default;
    array3d& operator= (array3d&&) = default;

    /* a(i,j,k) is the equivalent of a[i][j][k], except that the indices are
     * signed rather than unsigned.  WARNING: It does not check bounds!
     */
    T& operator() ( const ptrdiff_t i,
                    const ptrdiff_t j,
                    const ptrdiff_t k ) noexcept
    {
      return storage[make_index(i,j,k)];
    }

    const T& operator() ( const ptrdiff_t i,
                          const ptrdiff_t j,
                          const ptrdiff_t k ) const noexcept
    {
      return const_cast<array3d&>(*this)(i,j,k);
    }

    /* a.at(i,j,k) checks bounds.  Error-checking is by assertion, rather than
     * by exception, and the indices are signed.
     */
    T& at( const ptrdiff_t i, const ptrdiff_t j, const ptrdiff_t k )
    {
      bounds_check(i,j,k);
      return (*this)(i,j,k);
    }

    const T& at( const ptrdiff_t i,
                 const ptrdiff_t j,
                 const ptrdiff_t k ) const
    {
      return const_cast<array3d&>(*this).at(i,j,k);
    }

/* Given a function or function object f(i,j,k), fills each element of the
 * container with a(i,j,k) = f(i,j,k).
 */
    void generate( const std::function<T(ptrdiff_t,
                                         ptrdiff_t,
                                         ptrdiff_t)> f )
    {
      iterator it = storage.begin();

      for ( ptrdiff_t i = 0; i < nrows; ++i )
        for ( ptrdiff_t j = 0; j < ncols; ++j )
          for ( ptrdiff_t k = 0; k < nlayers; ++k )
            *it++ = f(i,j,k);

      assert(it == storage.end());
    }

/* Could define a larger API, e.g. begin(), end(), rbegin() and rend() from the STL.
 * Whatever you need.
 */

    private:
    ptrdiff_t nrows, ncols, nlayers;
    std::vector<T> storage;

    constexpr size_t make_index( const ptrdiff_t i,
                                 const ptrdiff_t j,
                                 const ptrdiff_t k ) const noexcept
    {
      return static_cast<size_t>((i*ncols + j)*nlayers + k);
    }

    // This could instead throw std::out_of_range, like STL containers.
    constexpr void bounds_check( const ptrdiff_t i,
                                 const ptrdiff_t j,
                                 const ptrdiff_t k ) const
    {
      assert( i >=0 && i < nrows );
      assert( j >= 0 && j < ncols );
      assert( k >= 0 && k < nlayers );
    }
};

// In a real-world scenario, this test driver would be in another source file:

constexpr float f( const ptrdiff_t i, const ptrdiff_t j, const ptrdiff_t k )
{
  return static_cast<float>( k==0 ? 1.0 : -1.0 *
                             ((double)i + (double)j*1E-4));
}

int main(void)
{
  constexpr ptrdiff_t N = 2200, M = 4410, P = 2;
  const array3d<float> a(N, M, P, f);

  // Should be: -1234.4321
  cout << std::setprecision(8) << a.at(1234,4321,1) << endl;

  return EXIT_SUCCESS;
}

It’s worth noting that this code technically contains undefined behavior: it assumes that signed-integer multiplicative overflow produces a negative number, but in fact the compiler is entitled to generate completely broken code if the program requests some absurd amount of memory at runtime.

Of course, if the array bounds are constants, just declare them constexpr and use an array with fixed bounds.

It’s unfortunate that every new C++ programmer learns about char** argv first, because that makes people think that a “two-dimensioal” array is a “ragged” array of pointers to rows.

In the real world, that’s almost never the best data structure for the job.