When is the ergodic hypothesis reasonable?
if the system is non linear but has a limited number of effective degrees of freedom, say $D=3,4,5$, is the ergodic hypothesis justified?
That's essentially a math question and unfortunately there doesn't seem to exist any condition besides those of the different equivalent definitions of ergodicity. Concrete examples of proofs of ergodicity can be found in ergodic theory textbooks (Charles Walkden's notes on ergodic theory are available on-line (pdf1, pdf2)).
Consider an Hamiltonian system. In which circumstances is it possible to assume that all the states belonging to the hypersurface $H=E_0$ are equally visited?
If equally is a keyword, than I'd say that here you're asking for what are the preconditions for mixing, and you can read a bit about its relation to ergodicity in this answer to the post Are there necessary and sufficient conditions for ergodicity? (though the Stanford link is of course much better), and, again, the conditions are simply those of its definition.
Is it necessary to have a very high number of degrees of freedom?
No. But, as Arnold diffusion (this paper (e-print) seems to be a nice introduction) shows, it's certainly harder to contain trajectories in Hamiltonian systems with more than 2 degrees of freedom.
What about the presence of regular islands where the chaotic sea cannot penetrate? Do they shrink as $D$ increases? Are they always present?
There are systems which are fully chaotic so, no, islands are not always present. As for higher-dimensional systems, the KAM surfaces don't separate the phasespace into distinct regions, allowing for phenomena such as the Arnold diffusion mentioned above to occur, so one could say that the regular regions become less influential in higher-dimensional systems.
I come from a simulation background, so this is more intuition than a rigorous derivation.
One way to conceptualize whether or not erogdicity will hold is if one can reasonably expect to sample the same portions of phase space using dynamics and using a Monte Carlo process. That is, if I just randomly rearrange the particles and then Boltzmann weight their contribution to the partition function, will this give me the same answer in the long time limit as forcing the particles to visit these configurations via some kind of continuous dynamics?
One contrived example you can think of when this will not be the case is a one dimensional system where one particle is on the left and the other is on the right. If these are classical particles, they will ever tunnel through each other, so the ensemble average and the time average of any quantity will not be equal because each particle only samples a subset of the total available phase space.
Another, more physical, example of non-ergodicity is when a system undergoes a phase transition. Consider a simple two-dimensional Ising model. As the temperature is lowered, eventually the entropic contributions cease to dominate and the system undergoes a phase transition so that it is all spin up or all spin down. This takes place at a finite temperature. Clearly, in the presence of a large number of spins, it will take an immensely long time for all of these spins to flip and allow us to sample the other branch of the phase transition. Thus, as a practical matter, the ensemble average will sample more of phase space than a time average will and this system is non-ergodic. At zero temperature, this is formally true as one will never sample the other branch of the phase transition and the system is truly non-ergodic.
To more directly answer your question based on this last example, the ergodic hypothesis is reasonable when there are not discontinuities in phase space, as these often lead to the system having to choose one of various possible branches, from which it is extremely unlikely the system will leave in finite time.
As an aside, in practice one usually samples a system in discrete steps (this is true both in Markovian processes and in dynamics), so it is often said that systems with very high frequencies in them are highly non-ergodic because it will take an extremely long time to sample phase space since an extremely small time step will be required to sample along the very high frequency motion. This is only a practical issue though, not a theoretical one.
I'm sure there is a more theoretical answer than this one which it seems you might be looking for, but I thought I would chime in.