Applying multiple tuples to the same function (i.e. `apply(f, tuples...)`) without recursion or `tuple_cat`

Here's my take on it. It doesn't use recursion and it expands those tuples in the same pack expansion, but it requires a bit of preparation:

  • We build a tuple of references to the tuples passed in, rvalue references for rvalue arguments, lvalue references for lvalue arguments, in order to have proper forwarding in the final call (exactly what std::forward_as_tuple does, as T.C. noted in the comments). The tuple is built and passed around as an rvalue, so reference collapsing ensures correct value categories for each argument in the final call to f.
  • We build two flattened index sequences, both of size equal to the sum of all tuple sizes:
    • The outer indices select the tuple, so they repeat the same value (the tuple's index in the tuple pack) a number of times equal to the size of each tuple.
    • The inner ones select the element in each tuple, so they increase from 0 to one less than the tuple size for each tuple.

Once we have that in place, we just expand both index sequences in the call to f.

#include <tuple>
#include <array>
#include <cstddef>
#include <utility>
#include <type_traits>
#include <iostream>

template<std::size_t S, class... Ts> constexpr auto make_indices()
{
   constexpr std::size_t sizes[] = {std::tuple_size_v<std::remove_reference_t<Ts>>...};
   using arr_t = std::array<std::size_t, S>;
   std::pair<arr_t, arr_t> ret{};
   for(std::size_t c = 0, i = 0; i < sizeof...(Ts); ++i)
      for(std::size_t j = 0; j < sizes[i]; ++j, ++c)
      {
         ret.first[c] = i;
         ret.second[c] = j;
      }
   return ret;
}

template<class F, class... Tuples, std::size_t... OuterIs, std::size_t... InnerIs> 
constexpr decltype(auto) multi_apply_imp_2(std::index_sequence<OuterIs...>, std::index_sequence<InnerIs...>, 
                                           F&& f, std::tuple<Tuples...>&& t)
{
   return std::forward<F>(f)(std::get<InnerIs>(std::get<OuterIs>(std::move(t)))...);
}

template<class F, class... Tuples, std::size_t... Is> 
constexpr decltype(auto) multi_apply_imp_1(std::index_sequence<Is...>, 
                                           F&& f, std::tuple<Tuples...>&& t)
{
   constexpr auto indices = make_indices<sizeof...(Is), Tuples...>();
   return multi_apply_imp_2(std::index_sequence<indices.first[Is]...>{}, std::index_sequence<indices.second[Is]...>{},
      std::forward<F>(f), std::move(t));
}

template<class F, class... Tuples> 
constexpr decltype(auto) multi_apply(F&& f, Tuples&&... ts)
{
   constexpr std::size_t flat_s = (0U + ... + std::tuple_size_v<std::remove_reference_t<Tuples>>);
   if constexpr(flat_s != 0)
      return multi_apply_imp_1(std::make_index_sequence<flat_s>{}, 
         std::forward<F>(f), std::forward_as_tuple(std::forward<Tuples>(ts)...));
   else
      return std::forward<F>(f)();
}

int main()
{
   auto t0 = std::make_tuple(1, 2);
   auto t1 = std::make_tuple(3, 6, 4, 5);
   auto sum = [](auto... xs) { return (0 + ... + xs); };

   std::cout << multi_apply(sum, t0, t1, std::make_tuple(7)) << '\n';
}

It compiles on the trunk versions of Clang and GCC in C++1z mode. In terms of generated code, GCC with -O2 optimizes the call to multi_apply to a constant 28.


Replacing std::array with a built-in array inside make_indices by using arr_t = std::size_t[S]; makes it compile on Clang 3.9.1 (that version of libc++ lacks constexpr on std::array's operator[]).

Further replacing std::tuple_size_v with std::tuple_size<X>::value and removing the if constexpr test in multi_apply makes it compile on GCC 6.3.0. (The test handles the cases when no tuples are passed in or all tuples passed in are empty.)

Further replacing the uses of fold expressions with calls like

sum_array({std::tuple_size_v<std::remove_reference_t<Tuples>>...})

where sum_array can be something simple like

template<class T, std::size_t S> constexpr T sum_array(const T (& a)[S], std::size_t i = 0)
{
   return i < S ? a[i] + sum_array(a, i + 1) : 0;
}

makes it compile on the latest MSVC 2017 RC (MSVC actually has std::tuple_size_v, but it needs the other changes). The generated code is still great: after replacing the body of the sum lambda with sum_array({xs...}), the resulting code is a direct call to sum_array with the array built in-place directly from the elements of all tuples, so the multi_apply machinery doesn't introduce any run time overhead.


std::apply is defined in terms of INVOKE, so, to keep things consistent, the final call to f should be

std::invoke(std::forward<F>(f), std::get<InnerIs>(std::get<OuterIs>(std::move(t)))...)

Implementations may provide a noexcept-specifier on std::apply (at least, libc++ does; libstdc++ and MSVC currently don't) so that may be worth considering too.