Slow nested loop left join with index scan 130k times in loop

The execution plan looks pretty good in my opinion.

What you could try is to see if performance improves when a nested loop join is avoided.
To test this, run

SET enable_nestloop = off;

before your query and see if that improves the speed.

Other than that, I cannot think of any improvements.


Your query and explain output don't look so bad. Still, a couple of observations:

  1. An index-only scan instead of an index scan on matches_loosing_champion_ids_index would be faster. The reason you don't see that: the useless count(match_id).

  2. 5 bitmap index scans + BitmapOR step are pretty fast but a single bitmap index scan would be faster.

  3. The most expensive part in this query plan is the Nested Loop Left Join. Might be different for other players.

With your schema

Query 1: LEFT JOIN LATERAL

This way, we aggregate before we join and don't need another GROUP BY. Also fewer join operations. And count(*) should unblock index-only scans:

SELECT player1, player2, player3, player4, player5
     , ((win * 1.0) / (win + loss))::numeric(5,5) AS winrate
     , win + loss AS matches
FROM  (
   SELECT winning_champion_one_id   AS player1
        , winning_champion_two_id   AS player2
        , winning_champion_three_id AS player3
        , winning_champion_four_id  AS player4
        , winning_champion_five_id  AS player5
        , COUNT(*) AS win           -- see below
   FROM   matches
   WHERE  157 IN (winning_champion_one_id
                , winning_champion_two_id
                , winning_champion_three_id
                , winning_champion_four_id
                , winning_champion_five_id)
   GROUP  BY 1,2,3,4,5
   ) w
LEFT  JOIN LATERAL (
   SELECT COUNT(*) AS loss          -- see below
   FROM   matches
   WHERE  loosing_champion_one_id   = w.player1
   AND    loosing_champion_two_id   = w.player2
   AND    loosing_champion_three_id = w.player3
   AND    loosing_champion_four_id  = w.player4
   AND    loosing_champion_five_id  = w.player5
   GROUP  BY loosing_champion_one_id
           , loosing_champion_two_id
           , loosing_champion_three_id
           , loosing_champion_four_id
           , loosing_champion_five_id
   ) l ON true
WHERE  win + loss > 19
ORDER  BY winrate DESC, matches DESC
LIMIT  1;

count(*):
is slightly shorter and faster in Postgres, doing the same as count(match_id) here, because match_id is never NULL.

Removing the only reference to match_id allows an index-only scan on matches_loosing_champion_ids_index! Some other preconditions must be met ...

Query 2: UNION ALL

Another way around the expensive Nested Loop Left Join, and a single GROUP BY. But we add 5 more bitmap index scans. May or may not be faster:

SELECT player1, player2, player3, player4, player5
     ,(count(win) * 1.0) / count(*) AS winrate  -- I would round ...
     , count(*)                     AS matches
FROM  (
   SELECT winning_champion_one_id   AS player1
        , winning_champion_two_id   AS player2
        , winning_champion_three_id AS player3
        , winning_champion_four_id  AS player4
        , winning_champion_five_id  AS player5
        , TRUE AS win
   FROM   matches
   WHERE  157 IN (winning_champion_one_id
                , winning_champion_two_id
                , winning_champion_three_id
                , winning_champion_four_id
                , winning_champion_five_id)
   UNION ALL 
   SELECT loosing_champion_one_id
        , loosing_champion_two_id
        , loosing_champion_three_id
        , loosing_champion_four_id
        , loosing_champion_five_id
        , NULL AS win              -- following "count(win)" ignores NULL values
   FROM   matches
   WHERE  157 IN (loosing_champion_one_id
                , loosing_champion_two_id
                , loosing_champion_three_id
                , loosing_champion_four_id
                , loosing_champion_five_id)
   ) m
GROUP  BY 1,2,3,4,5
HAVING count(*) > 19    -- min 20 matches
-- AND    count(win) > 0   -- min 1 win --  see below!
ORDER  BY winrate DESC, matches DESC
LIMIT  1;

AND count(win) > 0 is commented out, because it's redundant, while you pick the single best winrate anyways.

Different schema

I really would use a different schema to begin with:

CREATE TABLE team (
   team_id serial PRIMARY KEY --  or bigserial if you expect > 2^31 distinct teams
   -- more attributes?
);

CREATE TABLE player (
   player_id smallserial PRIMARY KEY
   -- more attributes?
);

CREATE TABLE team_member (
   team_id   integer  REFERENCES team
 , player_id smallint REFERENCES player
 , team_pos  smallint NOT NULL CHECK (team_pos BETWEEN 1 AND 5) -- only if position matters
 , PRIMARY KEY (team_id, player_id)
 , UNIQUE      (team_id, team_pos)
);
CREATE INDEX team_member_player_team_idx on team_member (player_id, team_id);

--  Enforce 5 players per team. Various options, different question.

CREATE TABLE match (
   match_id bigserial PRIMARY KEY
 , winner   integer NOT NULL REFERENCES team
 , loser    integer NOT NULL REFERENCES team
 , CHECK (winner <> loser)    --  wouldn't make sense
);
CREATE INDEX match_winner_loser_idx ON match (winner, loser);
CREATE INDEX match_loser_winner_idx ON match (loser, winner);

Subsidiary tables add to the disk footprint, but the main table is a bit smaller. And most importantly, you need fewer indexes, which should be substantially smaller overall for your cardinalities.

Query

We don't need any other indexes for this equivalent query. Much simpler and presumably faster now:

SELECT winner
     , win * 1.0/ (win + loss) AS winrate
     , win + loss AS matches
FROM  (
   SELECT w.winner, count(*) AS win
   FROM   team_member tm
   JOIN   match w ON w.winner = tm.team_id
   WHERE  tm.player_id = 157
   GROUP  BY w.winner
   ) w
LEFT   JOIN LATERAL (
   SELECT count(*) AS loss
   FROM   match l
   WHERE  l.loser = w.winner
   GROUP  BY l.loser
   ) l ON true
WHERE  win + loss > 19
ORDER  BY winrate DESC, matches DESC
LIMIT  1;

Join the result to team_member to get individual players.

You can also try the corresponding UNION ALL technique from above:

WITH t AS (
   SELECT team_id
   FROM   team_member
   WHERE  player_id = 157  -- provide player here
   )
SELECT team_id
     ,(count(win) * 1.0) / count(*) AS winrate
     , count(*)                     AS matches
FROM  (
   SELECT t.team_id, TRUE AS win
   FROM   t JOIN match ON winner = t.team_id
   UNION ALL 
   SELECT t.team_id, NULL AS win
   FROM   t JOIN match ON loser  = t.team_id
   ) m
GROUP  BY 1
HAVING count(*) > 19    -- min 20 matches
ORDER  BY winrate DESC, matches DESC
LIMIT  1;

bloom index

I briefly considered a bloom index for your predicate:

   WHERE  157 IN (loosing_champion_one_id
                , loosing_champion_two_id
                , loosing_champion_three_id
                , loosing_champion_four_id
                , loosing_champion_five_id)

Didn't test, probably won't pay for just 5 smallint columns.