CosmosDb count distinct elements

I know this is an old thread.

However, just to keep the topic updated, currently (Jul 2020) you are able to do SELECT DISTINCT over Cosmos DB table. However directly applying COUNT(DISTINCT..) doesnt give correct results. Hence, you need to apply a workaround as below using a subquery based approach to get the correct distinct count results

SELECT COUNT(UniqueIDValues) AS UniqueCount
FROM (SELECT Id FROM c GROUP BY Id) AS UniqueIDValues

[Update 19 Nov 2020]

Here is another query that solves the district issue and works for count. Basically, you need to encapsulate the distinct and then count. We have tested it with paging for cases that you want the unique records and not just the count and it's working as well.

select value count(1) from c join (select distinct value c from p in c.products)

You can also use where clause inside and outside of the bracket depending on what your condition is based on.

This is also mentioned slightly differently in another answer here.

Check the select clause documentation for CosmosDB.

@ssmsexe brought this to my attention and I wanted to update the answer here.

[Original Answer]

Support for distinct has been added on 19th Oct 2018

The following query works just fine

SELECT distinct value c FROM c join p in c.products

However, it still doesn't work for count.

The workaround for counting distinct is to create a stored procedure to perform the distinct count. It will basically query and continue until the end and return the count.

If you pass a distinct query like above to the stored procedure below you will get a distinct count

function count(queryCommand) {
  var response = getContext().getResponse();
  var collection = getContext().getCollection();
  var count = 0;

  query(queryCommand);

  function query(queryCommand, continuation){
    var requestOptions = { continuation: continuation };
    var isAccepted = collection.queryDocuments(
        collection.getSelfLink(),
        queryCommand,
        requestOptions,
        function (err, feed, responseOptions) {
            if (err) {
                throw err;
            }

            //  Scan results
            if (feed) {
                count+=feed.length;
            }

            if (responseOptions.continuation) {
                //  Continue the query
                query(queryCommand, responseOptions.continuation)
            } else {
                //  Return the count in the response
                response.setBody(count);
            }
        });
    if (!isAccepted) throw new Error('The query was not accepted by the server.');
  }
}

The issue with that workaround is that it can potentially cross the RU limit on your collection and be unsuccessful. If that's the case you can implement a similar code on the server side which is not that great.


I did some investigation and found solution for it. In order to get count of distinct results you can not use count(1). You need to "wrap" subquery with AS subqueryName and then use count(subqueryName) like below:

select count(subqueryName) from (SELECT distinct r.x FROM r) as subqueryName

Cheers!


How about SELECT COUNT(1) FROM (SELECT distinct c.id FROM c) AS t;? – Evaldas Buinauskas May 30 '18 at 14:44

On 15 May 2019, The comment above is working with Where condition, I didn't try with a Join but the request does return the answer I'm looking for.

And it is working with the 100 elements limitation in CosmosDB.

If I make an example with Product it should be : SELECT COUNT(1) FROM (SELECT DISTINCT c.Id FROM c WHERE c.Brand = 'Coca')