Why is valarray so slow?

I suspect that the reason c = a*b is so much slower than performing the operations an element at a time is that the

template<class T> valarray<T> operator*
    (const valarray<T>&, const valarray<T>&);

operator must allocate memory to put the result into, then returns that by value.

Even if a "swaptimization" is used to perform the copy, that function still has the overhead of

  • allocating the new block for the resulting valarray
  • initializing the new valarray (it's possible that this might be optimized away)
  • putting the results into the new valarray
  • paging in the memory for the new valarray as it is initialized or set with result values
  • deallocating the old valarray that gets replaced by the result

The whole point of valarray is to be fast on vector machines, which x86 machines just aren't.

A good implementation on a nonvector machine should be able to match the performance that you get with something like

for (i=0; i < N; ++i) 
    c1[i] = a1[i] * b1[i];

and a bad one of course won't. Unless there is something in the hardware to expedite parallel processing, that is going to be pretty close to the best that you can do.


I just tried it on a Linux x86-64 system (Sandy Bridge CPU):

gcc 4.5.0:

double operator* 9.64185 ms
valarray operator* 9.36987 ms
valarray[i] operator* 9.35815 ms

Intel ICC 12.0.2:

double operator* 7.76757 ms
valarray operator* 9.60208 ms
valarray[i] operator* 7.51409 ms

In both cases I just used -O3 and no other optimisation-related flags.

It looks like the MS C++ compiler and/or valarray implementation suck.


Here's the OP's code modified for Linux:

#include <iostream>
#include <valarray>
#include <iostream>
#include <ctime>

using namespace std ;

double gettime_hp();

int main()
{
    enum { N = 5*1024*1024 };
    valarray<double> a(N), b(N), c(N) ;
    int i,j;
    for(  j=0 ; j<8 ; ++j )
    {
        for(  i=0 ; i<N ; ++i )
        {
            a[i]=rand();
            b[i]=rand();
        }

        double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0] ;
        double dtime=gettime_hp();
        for(  i=0 ; i<N ; ++i ) c1[i] = a1[i] * b1[i] ;
        dtime=gettime_hp()-dtime;
        cout << "double operator* " << dtime << " ms\n" ;

        dtime=gettime_hp();
        c = a*b ;
        dtime=gettime_hp()-dtime;
        cout << "valarray operator* " << dtime << " ms\n" ;

        dtime=gettime_hp();
        for(  i=0 ; i<N ; ++i ) c[i] = a[i] * b[i] ;
        dtime=gettime_hp()-dtime;
        cout << "valarray[i] operator* " << dtime<< " ms\n" ;

        cout << "------------------------------------------------------\n" ;
    }
}

double gettime_hp()
{
    struct timespec timestamp;

    clock_gettime(CLOCK_REALTIME, &timestamp);
    return timestamp.tv_sec * 1000.0 + timestamp.tv_nsec * 1.0e-6;
}

Tags:

C++

Valarray