How should I index a UUID in Postgres?

Use PostgreSQL's built-in uuid data type, and create a regular b-tree index on it.

There is no need to do anything special. This will result in an optimal index, and will also store the uuid field in as compact a form as is currently practical.

(Hash indexes in PostgreSQL prior to version 10 were not crash-safe and were really a historical relic that tended to perform no better than a b-tree anyway. Avoid them. On PostgreSQL 10 they've been made crash-safe and had some performance improvements made so you may wish to consider them.)

If for some reason you could not use the uuid type, you would generally create a b-tree on the text representation or, preferably, a bytea representation of the uuid.


BRIN-index? If you use time-based (version 1) UUIDs then they are generated so that their value increase. In which case BRIN is suitable.

https://www.postgresql.org/docs/9.5/brin-intro.html :

BRIN stands for Block Range Index. BRIN is designed for handling very large tables in which certain columns have some natural correlation with their physical location within the table. A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index. For example, a table storing a store's sale orders might have a date column on which each order was placed, and most of the time the entries for earlier orders will appear earlier in the table as well; a table storing a ZIP code column might have all codes for a city grouped together naturally.

BRIN indexes can satisfy queries via regular bitmap index scans, and will return all tuples in all pages within each range if the summary info stored by the index is consistent with the query conditions. The query executor is in charge of rechecking these tuples and discarding those that do not match the query conditions — in other words, these indexes are lossy. Because a BRIN index is very small, scanning the index adds little overhead compared to a sequential scan, but may avoid scanning large parts of the table that are known not to contain matching tuples.

The specific data that a BRIN index will store, as well as the specific queries that the index will be able to satisfy, depend on the operator class selected for each column of the index. Data types having a linear sort order can have operator classes that store the minimum and maximum value within each block range, for instance; geometrical types might store the bounding box for all the objects in the block range.

The size of the block range is determined at index creation time by the pages_per_range storage parameter. The number of index entries will be equal to the size of the relation in pages divided by the selected value for pages_per_range. Therefore, the smaller the number, the larger the index becomes (because of the need to store more index entries), but at the same time the summary data stored can be more precise and more data blocks can be skipped during an index scan.

Perfect for huge and "mostly" ordered data.

See this post for some benchmarks:

https://www.percona.com/blog/2019/07/16/brin-index-for-postgresql-dont-forget-the-benefits/

They genereated a 1.3 GB table of naturally ordered data (timestamps incemented). Then they generated a BRIN index (with pages_per_range = 32) and a B-Tree index on this database. Then they compared the SELECT execution time and the size of the indices. What they got:

B-Tree:

Planning Time: 22.225 ms Execution Time: 2.657 ms

public | testtab_date_idx | index | postgres | testtab | 171 MB

BRIN:

Planning Time: 0.272 ms Execution Time: 87.703 ms

public | testtab_date_brin_idx | index | postgres | testtab | 64 kB

meanwhile with no-index it would be:

Planning Time: 0.296 ms Execution Time: 1766.454 ms

Just to give a sense of orders.

What is important to discuss furthermore is the complexity of index update after INSERT of the two. While for BRIN it is O(1), since you write sequentially on the next free space on the memory and accordingly create new BRIN entries, however for B-Tree as we well know it is O(logN) (B-Trees) (higher the tree longer it takes).


Hash indexes are missing in action in PostgreSQL. PostgreSQL knows it needs hash indexes, and that it's code for hash indexes is old and moldy, but they don't remove it because they are waiting for someone to come along and overhaul hash indexing. See this thread:

http://www.postgresql.org/message-id/[email protected]