Is there a way to ensure WHERE clause happens after DISTINCT?

You can do this without using a subquery using LEFT JOIN:

SELECT  c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM    Comments AS c
        LEFT JOIN Comments AS c2
            ON c2.comment_id_no = c.comment_id_no
            AND c2.inserted_at > c.inserted_at
WHERE   c2.id IS NULL
AND     c.show = 'true';

I think all other approaches will require a subquery of some sort, this would usually be done with a ranking function:

SELECT  c.id, c.comment_id_no, c.text, c.show, c.inserted_at
FROM    (   SELECT  c.id, 
                    c.comment_id_no, 
                    c.text, 
                    c.show, 
                    c.inserted_at,
                    ROW_NUMBER() OVER(PARTITION BY c.comment_id_no 
                                      ORDER BY c.inserted_at DESC) AS RowNumber
            FROM    Comments AS c
        ) AS c
WHERE   c.RowNumber = 1
AND     c.show = 'true';

Since you have tagged with Postgresql you could also make use of DISTINCT ON ():

SELECT  *
FROM    (   SELECT  DISTINCT ON (c.comment_id_no) 
                    c.id, c.comment_id_no, c.text, c.show, c.inserted_at
            FROM    Comments AS c 
            ORDER By c.comment_id_no, inserted_at DESC
        ) x
WHERE   show = 'true';

Examples on DB<>Fiddle


As I told in comments I don't advice to pollute data tables with history/auditory stuff.

And no: "double versioning" suggested by @Josh_Eller in his comment isn't a good solution too: Not only for complicating queries unnecessarily but also for being much more expensive in terms of processing and tablespace fragmentation.

Take in mind that UPDATE operations never update anything. They instead write a whole new version of the row and mark the old one as deleted. That's why vacuum processes are needed to defragment tablespaces in order to recover that space.

In any case, apart of suboptimal, that approach forces you to implement more complex queries to read and write data while in fact, I suppose most of the times you will only need to select, insert, update or even delete single row and only eventually, look its history up.

So the best solution (IMHO) is to simply implement the schema you actually need for your main task and implement the auditory aside in a separate table and maintained by a trigger.

This would be much more:

  • Robust and Simple: Because you focus on single thing every time (Single Responsibility and KISS principles).

  • Fast: Auditory operations can be performed in an after trigger so every time you perform an INSERT, UPDATE, or DELETE any possible lock within the transaction is yet freed because the database engine knows that its outcome won't change.

  • Efficient: I.e. an update will, of course, insert a new row and mark the old one as deleted. But this will be done at a low level by the database engine and, more than that: your auditory data will be fully unfragmented (because you only write there: never update). So the overall fragmentation would be always much less.

That being said, how to implement it?

Suppose this simple schema:

create table comments (
    text text,
    mtime timestamp not null default now(),
    id serial primary key
);

create table comments_audit ( -- Or audit.comments if using separate schema
    text text,
    mtime timestamp not null,
    id integer,
    rev integer not null,
    primary key (id, rev)
);

...and then this function and trigger:

create or replace function fn_comments_audit()
returns trigger
language plpgsql
security definer
    -- This allows you to restrict permissions to the auditory table
    -- because the function will be executed by the user who defined
    -- it instead of whom executed the statement which triggered it.
as $$
DECLARE
BEGIN

    if TG_OP = 'DELETE' then
        raise exception 'FATAL: Deletion is not allowed for %', TG_TABLE_NAME;
        -- If you want to allow deletion there are a few more decisions to take...
        -- So here I block it for the sake of simplicity ;-)
    end if;

    insert into comments_audit (
        text
        , mtime
        , id
        , rev
    ) values (
        NEW.text
        , NEW.mtime
        , NEW.id
        , coalesce (
            (select max(rev) + 1 from comments_audit where id = new.ID)
            , 0
        )
    );

    return NULL;

END;
$$;

create trigger tg_comments_audit
    after insert or update or delete
    on public.comments
    for each row
    execute procedure fn_comments_audit()
;

And that's all.

Notice that in this approach you will have always your current comments data in comments_audit. You could have instead used the OLD register and only define the trigger in the UPDATE (and DELETE) operations to avoid it.

But I prefer this approach not only because it gives us an extra redundancy (an accidental deletion -in case it were allowed or the trigger where accidentally disabled- on the master table, then we would be able to recover all data from the auditory one) but also because it simplifies (and optimises) querying the history when it's needed.

Now you only need to insert, update or select (or even delete if you develop a little more this schema, i.e. by inserting a row with nulls...) in a fully transparent manner just like if it weren't any auditory system. And, when you need that data, you only need to query the auditory table instead.

NOTE: Additionally you could want to include a creation timestamp (ctime). In this case it would be interesting to prevent it of being modified in a BEFORE trigger so I omitted it (for the sake of simplicity again) because you can already guess it from the mtimes in the auditory table (even if you are going to use it in your application it would be very advisable to add it).


If you are running Postgres 8.4 or higher, ROW_NUMBER() is the most efficient solution :

SELECT *
FROM (
    SELECT c.*, ROW_NUMBER() OVER(PARTITION BY comment_id_no ORDER BY inserted_at DESC) rn
    FROM comments c
    WHERE c.show = 'true'
) x WHERE rn = 1

Else, this could also be achieved using a WHERE NOT EXISTS condition, that ensures that you are showing the latest comment :

SELECT c.*
FROM comments c
WHERE 
    c.show = 'true '
    AND NOT EXISTS (
        SELECT 1 
        FROM comments c1 
        WHERE c1.comment_id_no = c.comment_id_no AND c1.inserted_at > c.inserted_at
    )

I think you want:

select c.*
from comments c
where c.inserted_at = (select max(c2.inserted_at)
                       from comments c2
                       where c2.comment_id_no = c.comment_id_no
                      ) and
      c.show = 'true';

I don't understand what this has to do with select distinct. You simply want the last version of a comment, and then to check if you can show that.

EDIT:

In Postgres, I would do:

select c.*
from (select distinct on (comment_id_no) c.*
      from comments c
      order by c.comment_id_no, c.inserted_at desc
     ) c
where c.show

distinct on usually has pretty good performance characteristics.