How to conditionally set compiler optimization for template headers

Basically the compiler needs to minimize the space not mentioning that having the same template instantiated 2x could cause problems if there would be static members. So from what I know the compiler is processing the template either for every source code and then chooses one of the implementations, or it postpones the actual code generation to the link time. Either way it is a problem for this AVX thingy. I ended up solving it the old fashioned way - with some global definitions not depending on any templates or anything. For too complex applications this could be a huge problem though. Intel Compiler has a recently added pragma (I don't recall the exact name), that makes the function implemented right after it use just AVX instructions, which would solve the problem. How reliable it is, that I don't know.


I've worked around this problem successfully by forcing any templated functions that will be used with different compiler options in different source files to be inline. Just using the inline keyword is usually not sufficient, since the compiler will sometimes ignore it for functions larger than some threshold, so you have to force the compiler to do it.

In MSVC++:

template<typename T>
__forceinline int RtDouble(T number) {...}

GCC:

template<typename T>
inline __attribute__((always_inline)) int RtDouble(T number) {...}

Keep in mind you may have to forceinline any other functions that RtDouble may call within the same module in order to keep the compiler flags consistent in those functions as well. Also keep in mind that MSVC++ simply ignores __forceinline when optimizations are disabled, such as in debug builds, and in those cases this trick won't work, so expect different behavior in non-optimized builds. It can make things problematic to debug in any case, but it does indeed work so long as the compiler allows inlining.


I think the simplest solution is to let the compiler know that those functions are indeed intended to be different, by using a template parameter that does nothing but distinguish them:

File double.h:

template<bool avx, typename T>
int RtDouble(T number)
{
    // Side effect: generates avx instructions
    const int N = 1000;
    float a[N], b[N];
    for (int n = 0; n < N; ++n)
    {
        a[n] = b[n] * b[n] * b[n];
    }    
    return number * 2;
}

File fn_normal.cpp:

#include "fn_normal.h"
#include "double.h"

int FnNormal(int num)
{
    return RtDouble<false>(num);
}

File fn_avx.cpp:

#include "fn_avx.h"
#include "double.h"

int FnAVX(int num)
{
    return RtDouble<true>(num);
}