Purpose of reference template arguments

What reference non-type template parameters allow you to do is write code that will automatically be specialized to work with a particular object of static storage duration. This is extremely useful, e.g., in environments where resources need to be statically allocated. Let's say we have some Processor class that's supposed to do processing of some sort, involving the dynamic creation of a bunch of objects. Let's, furthermore, say that storage for these objects is supposed to come from a statically-allocated memory pool. We might have a very simple allocator that just contains some storage and a "pointer" to the beginning of free space

template <std::size_t SIZE>
class BumpPoolAllocator
{
    char pool[SIZE];

    std::size_t next = 0;

    void* alloc(std::size_t alignment)
    {
        void* ptr = pool + next;
        next = ((next + alignment - 1) / alignment * alignment);
        return ptr;
    }

public:
    template <typename T, typename... Args>
    T& alloc(Args&&... args)
    {
        return *new (alloc(alignof(T))) T(std::forward<Args>(args)...);
    }
};

and then statically allocate a memory pool of some size by placing an instance somewhere in static storage:

BumpPoolAllocator<1024*1024> pool_1;

Now, we could have a Processor that can work with any kind of memory pool

template <typename T, typename Pool>
class Processor
{
    Pool& pool;

    // …

public:
    Processor(Pool& pool) : pool(pool) {}

    void process()
    {
        // …

        auto bla = &pool.template alloc<T>();

        // …
    }
};

and then also allocate one of those statically

Processor<int, decltype(pool_1)> processor_1(pool_1);

But note how every such instance of a Processor now essentially contains a field holding the address of a pool object which is actually a constant known at compile time. And every time our Processor does anything with its pool, the address of the pool will be fetched from memory just to access always the same pool object located at an address which would actually be known at compile time. If we're already allocating everything statically, we might as well take advantage of the fact that the location of everything is known at compile time to get rid of unnecessary indirections. Using a reference template parameter, we can do just that:

template <typename T, auto& pool>
class Processor
{
    // …

public:
    void process()
    {
        // …

        auto bla = &pool.template alloc<T>();

        // …
    }
};

Processor<int, pool_1> processor_1;

Rather than have each Processor object hold on to the address of the pool it should use, we specialize the entire Processor to directly use a particular pool object. This allows us to get rid of any unnecessary indirections, the address of the pool to use will essentially just be inlined everywhere. At the same time, we retain the flexibility to freely compose pools and processors in whatever way we may desire:

BumpPoolAllocator<1024*1024> pool_1;  // some pool
BumpPoolAllocator<4*1024> pool_2;     // another, smaller pool


Processor<int, pool_1> processor_1;   // some processor

struct Data {};
Processor<Data, pool_1> processor_2;  // another processor using the same pool

Processor<char, pool_2> processor_3;  // another processor using the smaller pool

One environment where I find myself using reference template parameters in this way all the time is the GPU. There are a number of circumstances that make templates in general, and reference template parameters in particular, an extremely powerful (I would go as far as saying: essential) tool for GPU programming. First of all, the only reason to be writing GPU code to begin with is performance. Dynamic memory allocation from some global general-purpose heap is typically not an option on the GPU (massive overhead). Whenever dynamic resource allocation is required, this will generally be done using some purpose-built, bounded pool. Working with offsets relative to a static base address can be beneficial (if 32-Bit indices are sufficient) compared to doing the same thing with runtime-valued pointer arithmetic because GPUs typically have 32-Bit registers and the number of registers used can be the a limiting factor for the level of parallelism one can achieve. Thus, statically allocating resources and getting rid of indirections is generally attractive for GPU code. At the same time, the cost of indirect function calls is typically prohibitive on the GPU (due to the amount of state that would have to be saved and restored), which means that using runtime polymorphism for flexibility is usually out of the question. Templates with reference template parameters give us exactly what we need here: the ability to express complex operations on complex data structures in a way that is completely flexible up to the point where you hit compile but compiles down to the most rigid and efficient binary.

For similar reasons, I would imagine reference template parameters to be very useful, e.g., also in embedded systems…


One scenario could be a strong typedef with an identity token that shouldn't be of integral type, but instead a string for ease of use when serializing stuff. You can then leverage empty base class optimization to eliminate any additional space requirements a derived type has.

Example:

// File: id.h
#pragma once
#include <iosfwd>
#include <string_view>

template<const std::string_view& value>
class Id {
    // Some functionality, using the non-type template parameter...
    // (with an int parameter, we would have some ugly branching here)
    friend std::ostream& operator <<(std::ostream& os, const Id& d)
    {
        return os << value;
    }

    // Prevent UB through non-virtual dtor deletion:
    protected:
      ~Id() = default;
};

inline const std::string_view str1{"Some string"};
inline const std::string_view str2{"Another strinng"};

And in some translation unit:

#include <iostream>
#include "id.h"

// This type has a string-ish identity encoded in its static type info,
// but its size isn't augmented by the base class:
struct SomeType : public Id<str2> {};

SomeType x;

std::cout << x << "\n";

Tags:

C++

Templates