Select rows for which at least one row per set meets a condition

Basically you are looking for the expression:

client_status IS DISTINCT FROM 'n'

The column client_status should really be data type boolean, not text, which would allow the simpler expression:

client_status IS NOT FALSE

The manual has details in the chapter Comparison Operators.


Assuming your actual table has a UNIQUE or PK constraint, we arrive at:

CREATE TABLE test (
  company_id    integer NOT NULL, 
  client_id     integer NOT NULL, 
  client_status boolean,
  PRIMARY KEY (company_id, client_id)
);

Queries

All of these do the same (what you asked), which is the fastest depends on data distribution:

SELECT company_id, client_id
FROM   test t
WHERE  EXISTS (
   SELECT FROM test
   WHERE  company_id = t.company_id
   AND    client_status IS NOT FALSE
   );

Or:

SELECT company_id, client_id
FROM   test t
JOIN  (
   SELECT company_id
   FROM   test t
   GROUP  BY 1
   HAVING bool_or(client_status IS NOT FALSE)
   ) c USING (company_id);

Or:

SELECT company_id, client_id
FROM   test t
JOIN  (
   SELECT DISTINCT company_id, client_status 
   FROM   test t
   ORDER  BY company_id, client_status DESC
   ) c USING (company_id)
WHERE  c.client_status IS NOT FALSE;

Boolean values sort FALSE -> TRUE -> NULL in ascending sort order. So FALSE comes last in descending order. If there's is any other value available, then that one's picked first ...

  • Sorting null values after all others, except special

The added PK is implemented with a useful index for these queries. If you want faster, yet, add a partial index for query 1:

CREATE INDEX test_special_idx ON test (company_id, client_id)
WHERE  client_status IS NOT FALSE;

You could use window functions, too, but that would be slower. Example with first_value():

SELECT company_id, client_id
FROM  (
   SELECT company_id, client_id
        , first_value(client_status) OVER (PARTITION BY company_id
                                           ORDER BY client_status DESC) AS stat
   FROM   test t
   ) sub
WHERE  stat IS NOT FALSE;

For lots of rows per company_id, one of these techniques may be faster, still:

  • Optimize GROUP BY query to retrieve latest record per user

I think this can be simplified a bit:

select company_id 
from test 
group by company_id 
having count(*) filter (where client_status!='n' or client_status is null) > 0;

I may have misunderstood you but I imagine something like:

 select * 
 from test x 
 where exists ( 
     select 1 
     from test y 
     where x.company_id = y.company_id 
       and coalesce(client_status, 'y') <> 'n'
 );

will work. coalesce is use to map null to 'y', but anything different than 'n' should do

Using an OLAP function can save us a "join":

select company_id, client_id 
from (
    select x.*
         , count(nullif(coalesce(client_status,'y'),'n')) 
               over (partition by company_id) as cnt 
    from test x
) 
where cnt > 0;

Here we map null -> 'y' and 'n' -> null. Since count(x) will count rows where x is not null, we count rows where client_status <> 'n'. I used an OLAP function to avoid GROUP BY, which means that we only need to reference the table once.