Impact of the prior loop iteration on the execution time of the current iteration

Iff it is the memory allocation that is slowing it down and the contents of the memory before performComputation(input) is irrelevant you could just re-use the allocated memory block.

int performComputation(input, std::vector<char>& memory) { 

  /* Note: memory will need to be passed by reference*/

  auto start = std::chrono::steady_clock::now();
  for (int i = 0; i < numThreads; i++) {
    t.emplace_back(std::thread([&, i] {
      func(...); // Random access to memory

  for (int i = 0; i < numThreads; i++) {

  auto end = std::chrono::steady_clock::now();
  float time = std::chrono::duration<double, std::milli>(end - start).count();


int main() {

  // A. Allocate ~1GB memory here
  std::vector<char> memory(1028 * 1028 * 1028) //is that 1 gig?

  for (input: inputs)
    performComputation(input, memory);

I can't be too confident on the exact details, but it seems to me to be a result of memory allocation in building the map. I replicated the behaviour you're seeing using a plain unordered_map and a single mutex, and making the map object in func static fixed it entirely. (Actually now it's slightly slower the first time around, since no memory has been allocated for the map yet, and then faster and a consistent time every subsequent run.)

I'm not sure why this makes a difference, since the map has been destructed and the memory should have been freed. For some reason it seems the map's freed memory isn't reused on subsequent calls to func. Perhaps someone else more knowledgeable than I can elaborate on this.

Edit: reduced minimal, reproducible example and output

void func(int num_insertions)
    const auto start = std::chrono::steady_clock::now();

    std::unordered_map<int, int> map;
    for (int i = 0; i < num_insertions; ++i)
        map.emplace(i, i);

    const auto end = std::chrono::steady_clock::now();
    const auto diff = end - start;

    const auto time = std::chrono::duration<double, std::milli>(diff).count();
    std::cout << "i: " << num_insertions << "\ttime: " << time << "\n";

int main()

With non-static map:

i: 2048 time: 0.6035
i: 16777216     time: 4629.03
i: 2048 time: 124.44

With static map:

i: 2048 time: 0.6524
i: 16777216     time: 4828.6
i: 2048 time: 0.3802

Another edit: should also mention that the static version also requires a call to map.clear() at the end, though that's not really relevant to the question of the performance of the insertions.