Indexing array of strings column type in PostgreSQL

I think the best way to handle this would be to normalize your model. The following will probably contain errors as I didn't try it, but the idea should be clear:

CREATE TABLE users (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE groups (id INTEGER PRIMARY KEY, name VARCHAR(100) UNIQUE);
CREATE TABLE user_group (
    user INTEGER NOT NULL REFERENCES users,
    group INTEGER NOT NULL REFERENCES groups);
CREATE UNIQUE INDEX user_group_unique ON user_group (user, group);

SELECT users.name
    FROM user_group
    INNER JOIN users ON user_group.user = users.id
    INNER JOIN groups ON user_group.group = groups.id
    WHERE groups.name = 'Engineering';

The resulting execution plan should be fairly efficient already; you can optimize still by indexing ON user_group(group), which allows an index_scan rather than a sequential_scan to find the members of a particular group.


A gin index can be used:

CREATE TABLE users (
 name VARCHAR(100),
 groups text[]
);

CREATE INDEX idx_users ON users USING GIN(groups);

-- disable sequential scan in this test:
SET enable_seqscan TO off;

EXPLAIN ANALYZE
SELECT name FROM users WHERE  groups @> (ARRAY['Engineering']);

Result:

"Bitmap Heap Scan on users  (cost=4.26..8.27 rows=1 width=218) (actual time=0.021..0.021 rows=0 loops=1)"
"  Recheck Cond: (groups @> '{Engineering}'::text[])"
"  ->  Bitmap Index Scan on idx_users  (cost=0.00..4.26 rows=1 width=0) (actual time=0.016..0.016 rows=0 loops=1)"
"        Index Cond: (groups @> '{Engineering}'::text[])"
"Total runtime: 0.074 ms"

Using aggregate functions on an array, that will be another problem. The function unnest() might help.

Why don't you normalize your data? That will fix all problems, including many problems you didn't encouter yet.