PostgreSQL randomising combinations with LATERAL

I'd write the query like this, using LIMIT (3) instead of DISTINCT ON.

The generate_series(0, 9) is used to get all the distinct bins. You could use (SELECT DISTINCT line % 10 FROM foo) AS g (bin) instead, if the "bins" are not all the integers from 0 up to 9:

SELECT 
    g.bin, 
    ROW_NUMBER() OVER (PARTITION BY g.bin ORDER BY d.x) AS i,
    d.* 
FROM 
    generate_series(0, 9) AS g (bin), 
  LATERAL 
    ( SELECT f.*, random() x 
      FROM foo AS f 
      WHERE  f.line % 10 = g.bin 
      ORDER BY x 
      LIMIT 3
    ) AS d
ORDER BY 
    bin, x ;

Also, if you don't need the random() number in the output, you could use ORDER BY random() in the subquery and remove x from the select and order by clauses - or replace ORDER BY d.x with ORDER BY d.line.


What I'd like to do is get random picks from each bin several times.

There are lots of ways you can solve this problem. Each one introduces more randomness and takes more time.

  1. TABLESAMPLE SYSTEM and tsm_system_rows
  2. TABLESAMPLE BERNOULLI
  3. Creating ad-hoc bins and rolling the dice with statistics.
  4. Creating ad-hoc bins and ordering them randomly and picking.

In most circumstances, TABLEAMPLE SYSTEM and tsm_system_rows is plenty to get a "fair" sampling of the table. It has the added advantage of not having to visit the whole table.

In the event you need a more evenly spaced out sample, TABLESAMPLE BERNOULLI will visit the whole table and select from all of the pages inside.

In the event you want to proceed going ad-hoc, I think this will do you want too.

SELECT *
FROM (
  SELECT dense_rank() OVER (PARTITION BY bin ORDER BY random()), *
  FROM (
    SELECT line % 10 AS bin, line
    FROM foo                          
  ) AS t
) AS t                       
WHERE dense_rank <= 3
ORDER BY line;