thread_local at block scope

I find thread_local is only useful in three cases:

  1. If you need each thread to have a unique resource so that they don't have to share, mutex, etc. for using said resource. And even so, this is only useful if the resource is large and/or expensive to create or needs to persist across function invocations (i.e. a local variable inside the function will not suffice).

  2. An offshoot of (1) - you may need special logic to run when a calling thread eventually terminates. For this, you can use the destructor of the thread_local object created in the function. The destructor of such a thread_local object is called once for each thread that entered the code block with the thread_local declaration (at the end of the thread's lifetime).

  3. You may need some other logic to be performed for each unique thread that calls it, but only once. For instance, you could write a function that registers each unique thread that called a function. This may sound bizarre, but I've found uses for this in managing garbage-collected resources in a library I'm developing. This usage is closely-related to (1) but doesn't get used after its construction. Effectively a sentry object for a thread's entire lifetime.


First note that a block-local thread-local is implicitly static thread_local. In other words, your example code is equivalent to:

int main()
{
    static thread_local int n {42};
    std::thread t(My::f, &n);
    t.join();
    std::cout << n << "\n"; // prints 43
    return 0;
}

Variables declared with thread_local inside a function are not so different from globally defined thread_locals. In both cases, you create an object that is unique per thread and whose lifetime is bound to the lifetime of the thread.

The difference is only that globally defined thread_locals will be initialized when the new thread is run before you enter any thread-specific functions. In contrast, a block-local thread-local variable is initialized the first time control passes through its declaration.

A use case would be to speed up a function by defining a local cache that is reused during the lifetime of the thread:

void foo() {
  static thread_local MyCache cache;
  // ...
}

(I used static thread_local here to make it explicit that the cache will be reused if the function is executed multiple times within the same thread, but it is a matter of taste. If you drop the static, it will not make any difference.)


A comment about your the example code. Maybe it was intentional, but the thread is not really accessing the thread_local n. Instead it operates on a copy of a pointer, which was created by the thread running main. Because of that both threads refer to the same memory.

In other words, a more verbose way would have been:

int main()
{
    thread_local int n {42};
    int* n_ = &n;
    std::thread t(My::f, n_);
    t.join();
    std::cout << n << "\n"; // prints 43
    return 0;
}

If you change the code, so the thread accesses n, it will operate on its own version, and n belonging to the main thread will not be modified:

int main()
{
    thread_local int n {42};
    std::thread t([&] { My::f(&n); });
    t.join();
    std::cout << n << "\n"; // prints 42 (not 43)
    return 0;
}

Here is a more complicated example. It calls the function two times to show that the state is preserved between the calls. Also its output shows that the threads operate on their own state:

#include <iostream>
#include <thread>

void foo() {
  thread_local int n = 1;
  std::cout << "n=" << n << " (main)" << std::endl;
  n = 100;
  std::cout << "n=" << n << " (main)" << std::endl;
  int& n_ = n;
  std::thread t([&] {
          std::cout << "t executing...\n";
          std::cout << "n=" << n << " (thread 1)\n";
          std::cout << "n_=" << n_ << " (thread 1)\n";
          n += 1;
          std::cout << "n=" << n << " (thread 1)\n";
          std::cout << "n_=" << n_ << " (thread 1)\n";
          std::cout << "t executing...DONE" << std::endl;
        });
  t.join();
  std::cout << "n=" << n << " (main, after t.join())\n";
  n = 200;
  std::cout << "n=" << n << " (main)" << std::endl;

  std::thread t2([&] {
          std::cout << "t2 executing...\n";
          std::cout << "n=" << n << " (thread 2)\n";
          std::cout << "n_=" << n_ << " (thread 2)\n";
          n += 1;
          std::cout << "n=" << n << " (thread 2)\n";
          std::cout << "n_=" << n_ << " (thread 2)\n";
          std::cout << "t2 executing...DONE" << std::endl;
        });
  t2.join();
  std::cout << "n=" << n << " (main, after t2.join())" << std::endl;
}

int main() {
  foo();
  std::cout << "---\n";
  foo();
  return 0;
}

Output:

n=1 (main)
n=100 (main)
t executing...
n=1 (thread 1)      # the thread used the "n = 1" init code
n_=100 (thread 1)   # the passed reference, not the thread_local
n=2 (thread 1)      # write to the thread_local
n_=100 (thread 1)   # did not change the passed reference
t executing...DONE
n=100 (main, after t.join())
n=200 (main)
t2 executing...
n=1 (thread 2)
n_=200 (thread 2)
n=2 (thread 2)
n_=200 (thread 2)
t2 executing...DONE
n=200 (main, after t2.join())
---
n=200 (main)        # second execution: old state is reused
n=100 (main)
t executing...
n=1 (thread 1)
n_=100 (thread 1)
n=2 (thread 1)
n_=100 (thread 1)
t executing...DONE
n=100 (main, after t.join())
n=200 (main)
t2 executing...
n=1 (thread 2)
n_=200 (thread 2)
n=2 (thread 2)
n_=200 (thread 2)
t2 executing...DONE
n=200 (main, after t2.join())