Why does the parallel for_each require forward iterators?

There is a known flaw with the C++17 iterator model in that proxy iterators can only ever be input iterators, for the reasons you point out. This has lots of downsides. The parallel algorithms don't need non-proxy iterators, but they definitely need the multi-pass guarantee. And the current iterator category model conflates the two.

With C++20 ranges, we get this idea of iterator_concept, which is a backwards-compatible shim to properly support proxy iterators. You can have an iterator_category of input_iterator_tag but an iterator_concept of forward_iterator_tag, for instance. The new ForwardIterator concept does not look at the category, it looks at the concept:

template<class I>
  concept ForwardIterator =
    InputIterator<I> &&
    DerivedFrom<ITER_CONCEPT(I), forward_iterator_tag> &&
    Incrementable<I> &&
    Sentinel<I, I>;

Whether or not the parallel algorithms will change is a different question that I can't answer.

The C++17 iterator concepts define a forward iterator as being the weakest form of iterator that requires multiple iterators in the same range to function. That is, you're allowed to copy a forward iterator, increment the copy, but still access the original value through the original iterator.

The pure IntputIterator concept only requires single-pass. Once you increment an iterator, all other copies of it become effectively invalid.

Being able to parallelize for_each ultimately requires each parallel invocation to get a distinct set of iterators and values to operate on. That means the iterator has to be copyable and independent of the others. Which requires them to be forward iterators.

Now yes, that means you can't use proxy iterators with parallel for_each, even if your iterators are independent of each other. That's just the limitations of the C++17 iterator concept model.