Why does GL divide `gl_Position` by W for you rather than letting you do it yourself?

I'd like to extent on BDL's answer. It is not only about the perspective interpolation. It is also about the clipping. The space the value gl_Position is supposed to be provided in is called clip space, and this is before the division by w.

The (default) clip volume of OpenGL is defined in clip space as

-w <= x,y,z <= w   (with w varying per vertex)

After the division by w we get

-1 <= x,y,z <= 1   (in NDC coordinates).

However, if you try to do the clipping after the division by w, and would check against that cube in NDC, you get a problem, because all clip space points fullfilling this:

 w <= x,y,z <= -w (in clip space)

will also fullfill the NDC constraint.

The thing here is that points behind the camera will be transformed to somewhere in front of the camera, mirrored (since x/-1 is the same as -x/1). This also happens to the z coordinate. One might argue that this is irrelevant, because any point behind the camera is projected behind (in the sense of more far away than) the far plane, as per the construction of the typical projection matrix, so it will lie outside of the viewing volume in either case.

But if you have a primitive where at least one point is inside the view volume, and at least one point is behind the camera, you should have a primitive which intersects the near plane also. However, after the division by w, it will intersect the far plane now!. So clipping in NDC space, after the division, is much harder to get right. I tried to visualize this in this drawing:

top-down view of eye space and NDC with and without clipping (the drawing is to-scale, the depth range of projection is much shorter than anyone would typically use, to better illustrate the issue).

The clipping is done as a fixed-function stage in hardware and it has to be done before the division, hence you should provide the correct clip-space coordinates to work on.

(Note: actual GPUs might not use an extra clipping stage at all, they actually might also use a clipless rasterizer, like it is speculated in Fabian Giesen's blog article there. There are some algorithms like Olano and Greer (1997). However, this all works by doing the rasterization directly in homogenous coordinates, so we still need the w...)


The reason is, that not only gl_Position gets divided by the homogeneous coordinate, but also all other interpolated varyings. This is called perspective correct interpolation which requires the division to be after the interpolation (and thus after the rasterization). So doing the division in the vertex shader would simply not work. See also this post.


It's even simpler; the clipping happens after the vertex shading. If the vertex shader was allowed (or more strongly, mandated) to do perspective divison the clipping would have to happen in homogeneous coordinates which would be very inconvenient. The vertex attributes are still linear in clip coordinates which makes clipping a child's play instead of having to clip in homogeneous coordinates:

v' = 1.0f / (lerp(1.0 / v0, 1.0 / v1, t))

See how division-heavy that would be? In clip coordinates it is simply:

v' = lerp(v0, v1, t)

It is even better than that: the clipping limits in clip coordinates are:

-w < x < w

This means the distances to clip planes (left and right) are trivial to compute in clip coordinates:

x - w, and w - x. It's just so much simpler and efficient to clip in clip coordinates that it just makes all the sense in the world to insist that vertex shader outputs are in clip coordinates. Then let the hardware do the clipping and dividing by w-coordinate since there is no reason left to leave it to the user anymore. It's also simpler as that way we don't need post-clip vertex shader (which would also include mapping into the viewport but that is another story). The way they designed it is actually quite nice. :)