SQL counting distinct over partition

This is how I'd do it:

SELECT      *
FROM        #MyTable AS mt
CROSS APPLY (   SELECT COUNT(DISTINCT mt2.Col_B) AS dc
                FROM   #MyTable AS mt2
                WHERE  mt2.Col_A = mt.Col_A
                -- GROUP BY mt2.Col_A 
            ) AS ca;

The GROUP BY clause is redundant given the data provided in the question, but may give you a better execution plan. See the follow-up Q & A CROSS APPLY produces outer join.

Consider voting for OVER clause enhancement request - DISTINCT clause for aggregate functions on the feedback site if you would like that feature added to SQL Server.


This is, in a way, an extension to Lennart's solution, but it is so ugly that I dare not suggest it as an edit. The goal here is to get the results without a derived table. There may never be the need for that, and combined with the ugliness of the query the whole endeavour may seem like a wasted effort. I still wanted to do this as an exercise, though, and would now like to share my result:

SELECT
  Col_A,
  Col_B,
  DistinctCount = DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
                + DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
                - 1
                - CASE COUNT(Col_B) OVER (PARTITION BY Col_A)
                  WHEN COUNT(  *  ) OVER (PARTITION BY Col_A)
                  THEN 0
                  ELSE 1
                  END
FROM
  dbo.MyTable
;

The core part of the calculation is this (and I would first of all like to note that the idea is not mine, I learned about this trick elsewhere):

  DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
+ DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
- 1

This expression can be used without any change if the values in Col_B are guaranteed to never have nulls. If the column can have nulls, however, you need to account for that, and that is exactly what the CASE expression is there for. It compares the number of rows per partition with the number of Col_B values per partition. If the numbers differ, it means that some rows have a null in Col_B and, therefore, the initial calculation (DENSE_RANK() ... + DENSE_RANK() - 1) needs to be reduced by 1.

Note that because the - 1 is part of the core formula, I chose to leave it like that. However, it can actually be incorporated into the CASE expression, in the futile attempt to make the entire solution look less ugly:

SELECT
  Col_A,
  Col_B,
  DistinctCount = DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B ASC )
                + DENSE_RANK() OVER (PARTITION BY Col_A ORDER BY Col_B DESC)
                - CASE COUNT(Col_B) OVER (PARTITION BY Col_A)
                  WHEN COUNT(  *  ) OVER (PARTITION BY Col_A)
                  THEN 1
                  ELSE 2
                  END
FROM
  dbo.MyTable
;

This live demo at dbfiddle logodb<>fiddle.uk can be used to test both variations of the solution.


You can emulate it by using dense_rank, and then pick the maximum rank for each partition:

select col_a, col_b, max(rnk) over (partition by col_a)
from (
    select col_a, col_b
        , dense_rank() over (partition by col_A order by col_b) as rnk 
    from #mytable
) as t    

You would need to exclude any nulls from col_b to get the same results as COUNT(DISTINCT).