What's the performance penalty of weak_ptr?

From the Boost 1.42 source code (<boost/shared_ptr/weak_ptr.hpp> line 155):

shared_ptr<T> lock() const // never throws
{
    return shared_ptr<element_type>( *this, boost::detail::sp_nothrow_tag() );
}

ergo, James McNellis's comment is correct; it's the cost of copy-constructing a shared_ptr.


For my own project, I was able to improve performance dramatically by adding #define BOOST_DISABLE_THREADS before any boost includes. This avoids the spinlock/mutex overhead of weak_ptr::lock which in my project was a major bottleneck. As the project is not multithreaded wrt boost, I could do this.


Using/dereferencing a shared_ptr is almost like accessing raw ptr, locking a weak_ptr is a perf "heavy" operation compared to regular pointer access, because this code has to be "thread-aware" to work correctly in case if another thread triggers release of the object referenced by the pointer. At minimum, it has to perform some sort of interlocked/atomic operation that by definition is much slower than regular memory access.

As usual, one way to see what's going on is to inspect generated code:

#include <memory>

class Test
{
public:
    void test();
};

void callFuncShared(std::shared_ptr<Test>& ptr)
{
    if (ptr)
        ptr->test();
}

void callFuncWeak(std::weak_ptr<Test>& ptr)
{
    if (auto p = ptr.lock())
        p->test();
}

void callFuncRaw(Test* ptr)
{
    if (ptr)
        ptr->test();
}

Accessing through shared_ptr and raw pointer is the same. Since shared_ptr was passed as a reference, we need to load referenced value, that's why the difference is only one extra load for shared_ptr version.

callFuncShared:

enter image description here

callFuncWeak:

enter image description here

Calling through weak_ptr produces 10x more code and at best it has to go through locked compare-exchange, which by itself will take more than 10x CPU time than dereferencing raw or shared_ptr:

enter image description here

Only if the shared counter isn't zero, only then it can load the pointer to actual object and use it (by calling the object, or creating a shared_ptr).