Performance gap between vector<bool> and array

std::vector<bool> isn't like any other vector. The documentation says:

std::vector<bool> is a possibly space-efficient specialization of std::vector for the type bool.

That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.


std::vector<bool> is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool inside this container).

Now bool flag[n+1]; compiler will usually allocate same memory in same manner as for char flag[n+1]; and it will do that on stack, not on heap.

Now depending on page sizes, cache misses and i values one can be faster then other. It is hard to predict (for small n array will be faster, but for larger n result may change).

As an interesting experiment you can change std::vector<bool> to std::vector<char>. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.


I'd like to add some remarks to the good answers already posted.

  • The performance differences between std::vector<bool> and std::vector<char> may vary (a lot) between different library implementations and different sizes of the vectors.

    See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).

  • This: bool flag[n+1]; declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.

  • Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.

If you can bare the less readable code, you could try to profile the following snippet.

int countPrimes(int n)
{
    if ( n < 2 )
        return 0;
    // Sieve starting from 3 up to n, the number of odd number between 3 and n are
    int sieve_size = n / 2 - 1;
    std::vector<char> sieve(sieve_size); 
    int result = 1;  // 2 is a prime.

    for (int i = 0; i < sieve_size; ++i)
    {
        if ( sieve[i] == 0 )
        {
            // It's a prime, no need to scan the vector again
            ++result;
            // Some ugly transformations are needed, here
            int prime = i * 2 + 3;
            for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
                sieve[j / 2 - 1] = 1;
        }
    }

    return result;
}

Edit

As Peter Cordes noted in the comments, using an unsigned type for the variable j

the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.

It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.