Why does my OpenCL kernel fail on the nVidia driver, but not Intel (possible driver bug)?

After discussions with nVidia, this was confirmed to be both repeatable and a driver bug by a technical rep. A bug report was submitted, but unfortunately I was informed nVidia doesn't have a dedicated OpenCL dev team, so a timeline on a fix can't be provided.

Edit: After finally hearing back from nVidia, the workaround is apparently to use pow() instead of sqrt() in the CL kernel, as sqrt() is apparently the source of the bug.