Is a shared mutex more efficient than an atomic of a relatively big struct?

Any specialization for std:atomic for a struct like that is going to involve internal locking, so you've gained nothing, and now you also have a data race between the load and store you didn't have before, as this had exclusive locking around the whole block (i presume?) in the previous version.

Also with the shared_mutex, it might be wise to profile with a normal mutex vs shared_mutex, you may find the normal mutex performs better (all depends on how long you're holding your locks for).

The benefit of the shared_mutex is only seen when locks are being held for reading for an extended period of time and there are very few writes, otherwise the overhead involved in the shared_mutex kills any gains you would have over the normal mutex.