Performance gap between vector<bool> and array
std::vector<bool>
isn't like any other vector. The documentation says:
std::vector<bool>
is a possibly space-efficient specialization ofstd::vector
for the typebool
.
That's why it may use up less memory than an array, because it might represent multiple boolean values with one byte, like a bitset. It also explains the performance difference, since accessing it isn't as simple anymore. According to the documentation, it doesn't even have to store it as a contiguous array.
std::vector<bool>
is special case. It is specialized template. Each value is stored in single bit, so bit operations are needed. This memory compact but has couple drawbacks (like no way to have a pointer to bool
inside this container).
Now bool flag[n+1];
compiler will usually allocate same memory in same manner as for char flag[n+1];
and it will do that on stack, not on heap.
Now depending on page sizes, cache misses and i
values one can be faster then other. It is hard to predict (for small n
array will be faster, but for larger n
result may change).
As an interesting experiment you can change std::vector<bool>
to std::vector<char>
. In this case you will have similar memory mapping as in case of array, but it will be located at heap not a stack.
I'd like to add some remarks to the good answers already posted.
The performance differences between
std::vector<bool>
andstd::vector<char>
may vary (a lot) between different library implementations and different sizes of the vectors.See e.g. those quick benches: clang++ / libc++(LLVM) vs. g++ / libstdc++(GNU).
This:
bool flag[n+1];
declares a Variable Length Array, which (despites some performance advantages due to it beeing allocated in the stack) has never been part of the C++ standard, even if provided as an extension by some (C99 compliant) compilers.Another way to increase the performances could be to reduce the amount of calculations (and memory occupation) by considering only the odd numbers, given that all the primes except for 2 are odd.
If you can bare the less readable code, you could try to profile the following snippet.
int countPrimes(int n)
{
if ( n < 2 )
return 0;
// Sieve starting from 3 up to n, the number of odd number between 3 and n are
int sieve_size = n / 2 - 1;
std::vector<char> sieve(sieve_size);
int result = 1; // 2 is a prime.
for (int i = 0; i < sieve_size; ++i)
{
if ( sieve[i] == 0 )
{
// It's a prime, no need to scan the vector again
++result;
// Some ugly transformations are needed, here
int prime = i * 2 + 3;
for ( int j = prime * 3, k = prime * 2; j <= n; j += k)
sieve[j / 2 - 1] = 1;
}
}
return result;
}
Edit
As Peter Cordes noted in the comments, using an unsigned type for the variable j
the compiler can implement j/2 as cheaply as possible. C signed division by a power of 2 has different rounding semantics (for negative dividends) than a right shift, and compilers don't always propagate value-range proofs sufficiently to prove that j will always be non-negative.
It's also possible to reduce the number of candidates exploiting the fact that all primes (past 2 and 3) are one below or above a multiple of 6.