PostgreSQL: Generate a series of dates for each group in a table

1. CROSS JOIN, LEFT JOIN LATERAL to subquery

SELECT a.user_id, COALESCE(b.balance, 0) AS balance, d.as_of_date
FROM   (
   SELECT d::date AS as_of_date  -- cast to date right away
   FROM   generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
   ) d
JOIN   accounts a ON a.create_date <= d.as_of_date
LEFT   JOIN LATERAL (
   SELECT balance
   FROM   balances
   WHERE  user_id = a.user_id
   AND    as_of_date <= d.as_of_date
   ORDER  BY as_of_date DESC
   LIMIT  1
   ) b ON true
ORDER  BY a.user_id, d.as_of_date;

Returns your desired result - except that as_of_date is an actual date, not a timestamp like in your example. That should be more appropriate.

Users that are created already, but don't have any transactions, yet, are listed with a balance of 0. You did not define how to deal with the corner case.

Rather use timestamp input for generate_series():

  • Generating time series between two dates in PostgreSQL

It's crucial for performance that you back this up with a multicolumn index:

CREATE INDEX balances_multi_idx ON balances (user_id, as_of_date DESC, balance);

We have had a very similar case on SO just this week:

  • Aggregating the most recent joined records per week

Find more explanation there.

2. CROSS JOIN, LEFT JOIN , window functions

SELECT user_id
     , COALESCE(max(balance) OVER (PARTITION BY user_id, grp
                                   ORDER BY as_of_date), 0) AS balance
     , as_of_date
FROM  (
   SELECT a.user_id, b.balance, d.as_of_date
        , count(b.user_id) OVER (PARTITION BY user_id ORDER BY as_of_date) AS grp
   FROM   (
      SELECT d::date AS as_of_date  -- cast to date right away
      FROM   generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
      ) d
   JOIN   accounts a ON a.create_date <= d.as_of_date
   LEFT   JOIN balances b USING (user_id, as_of_date)
   ) sub
ORDER  BY user_id, as_of_date;

Same result. If you have the multicolumn index mentioned above and can get index-only scans out of it, the first solution is most probably faster.

The main feature is the running count of values to form groups. since count() does not count NULL values, all dates without balance fall into the same group (grp) as the most recent balance. Then use a simple max() over the same window frame extended by grp to copy the last balance for dangling gaps.

Related:

  • Select longest continuous sequence
  • How to get max sequence of rows?