Why are some stars very large (i.e., $r \geq 1000 \ R_{\odot}$) but not super massive?

The virial theorem is a way of expressing the concept of hydrostatic equilibrium in a star. In dimensional terms we can say that $$ \Omega = -3\int P\ dV,$$ where $\Omega$ is the gravitational potential energy and $P$ is the pressure.

Assuming a perfect gas and a uniform sphere (OK for a dimensional analysis), we can rewrite this as $$ -\frac{3GM^2}{5R} = -3\frac{M kT}{\mu m_u},$$ where $\mu$ is the number of atomic mass units per particle in the gas and $T$ is some characteristic interior temperature. From this, we get $$R \sim \frac{GM\mu m_u}{5kT}$$

Now, what this simple argument shows is that the radius of a star does not just depend on its mass. It depends on $\mu$, which is composition dependent, and it depends on the interior temperature (profile).

Thus two stars with a different interior composition or internal temperature can have quite different radii at the same mass.

The radius also crucially depends on where nuclear burning is taking place (in the core or in a shell). A general rule is that shell burning stars have much larger radii.

It is this latter point which is largely responsible for the large discrepancy you note. There are no easy handwaving ways to explain why this is, but most of the luminosity of stars like VY CMa will be coming from a H burning shell.

Shell burning begins when the temperatures at the core are insufficient to ignite the ash of the previous burning phase. A layer of fresh fuel outside the core is compressed and heated until it ignites, with a greater volume and higher burning rate than the original core. This means the luminosity of the star increases drastically. However, there is a maximum temperature gradient supportable by stellar material - the so-called adiabatic temperature gradient where the star becomes unstable to convection. This maximum to the temperature gradient means that in order to radiate away the increased luminosity at the photosphere (at a few thousand degrees where the atmosphere becomes optically thin), the star has to swell up, according to Stefan's law ($L= 4\pi R^2 \sigma T^4$), to a much larger size.

So that's the key, it's what the star is made of and where the nuclear burning is taking place inside the star.

This will be a short answer, not going very deep into how stars work. Basically, a star is a ball of gas which is more or less in equilibrium between collapse due to gravitation and expansion due to heat.

The radius of the star is determined by this equilibrium. A star which is more massive can have a smaller radius due to a large gravitational pull inwards. The temperature of a star, and thus the expansive force due to heat (you can imagine this like for an ideal gas: if you heat it up, it expands), is determined by nuclear fusion in its core.

Red supergiants, like UY Scuti have used up all their hydrogen fuel, thus their core collapsed due to lack of outward force in the core and got extremely hot. Because of this heat, and the relatively low mass, the equilibrium is established at a large radius. Eta Carinae is not as hot in its core but has more mass, so its radius is smaller.

Also note that the color of a star is determined by its surface temperature, not its core temperature.

The mass of a star $M$ is given by the integral over its density distribution: $$ M=\int_0^R 4 \pi r^2 \rho (r) \, dr $$ So only because the star is big (large radius $R$) does not necessarily mean that it is heavy. It depends on its density profile. This profile depends on central pressure, equation of state, temperature-/luminosity-profile and more. The mass/radius relation of a star is a non-trivial result of many parameters. So the reason for the different mass/radius relations is in general the different internal composition.