What is the maximum block count possible in CUDA?

65535 in a single dimension. Here's the complete table

In case anybody lands here based on a Google search (as I just did):

Nvidia changed the specification since this question was asked. With compute capability 3.0 and newer, the x-Dimension of a grid of thread blocks is allowed to be up to 2'147'483'647 or 2^31 - 1.

See the current: Technical Specification

With compute capability 3.0 or higher, you can have up to 2^31 - 1 blocks in the x-dimension, and at most 65535 blocks in the y and z dimensions. See Table H.1. Feature Support per Compute Capability of the CUDA C Programming Guide Version 9.1.

As Pavan pointed out, if you do not provide a dim3 for grid configuration, you will only use the x-dimension, hence the per dimension limit applies here.

