How to design indexes for columns with NULL values in MySQL?

It is the NOT NULL:

CREATE TEMPORARY TABLE `myTab` (`notnul` FLOAT, `nul` FLOAT);
INSERT INTO `myTab` VALUES (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2), (1, NULL), (1, 2);
SELECT * FROM `myTab`;

gives:

+--------+------+
| notnul | nul  |
+--------+------+
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
|      1 | NULL |
|      1 |    2 |
+--------+------+

Create the index:

CREATE INDEX `notnul_nul` ON `myTab` (`notnul`, `nul`);
CREATE INDEX `nul_notnul` ON `myTab` (`nul`, `notnul`);

SHOW INDEX FROM `myTab`;

gives:

+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| myTab |          1 | notnul_nul |            1 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | notnul_nul |            2 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            1 | nul         | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| myTab |          1 | nul_notnul |            2 | notnul      | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

now explain the selects. It seems that MySQL uses the index, even if You use NOT NULL:

EXPLAIN SELECT * FROM `myTab` WHERE `notnul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+ 
|  1 | SIMPLE      | myTab | index | notnul_nul    | notnul_nul | 10      | NULL |   12 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+


EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL;
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | myTab | range | nul_notnul    | nul_notnul | 5       | NULL |    6 | Using where; Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+

But, when comparing NOT NULL and NULL, it seems that MySQL preferrs other indexes when using NOT NULL. Although this does obviously not add any information. This is because MySQL interprets NOT NULL as a range as you can see in type-column. I'm not sure If there is a workaround:

EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NULL && notnul=2;
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
| id | select_type | table | type | possible_keys         | key        | key_len | ref         | rows | Extra                    |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+
|  1 | SIMPLE      | myTab | ref  | notnul_nul,nul_notnul | notnul_nul | 10      | const,const |    1 | Using where; Using index |
+----+-------------+-------+------+-----------------------+------------+---------+-------------+------+--------------------------+


EXPLAIN SELECT * FROM `myTab` WHERE `nul` IS NOT NULL && notnul=2;
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
| id | select_type | table | type  | possible_keys         | key        | key_len | ref  | rows | Extra                    |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | myTab | range | notnul_nul,nul_notnul | notnul_nul | 10      | NULL |    1 | Using where; Using index |
+----+-------------+-------+-------+-----------------------+------------+---------+------+------+--------------------------+

I think there could be a better implementation in MySQL, because NULL is a special value. Probably most people are interested in NOT NULL values.


The issue isn't the NULL values. It is the selectivity of the index. In your example, the selectivity of source, pop1 is better than the selectivity of just pop1. It covers more of the conditions in the where clause, so it is more likely to reduce page hits.

You may think that reducing the number of rows by 50% is enough, but it really isn't. The benefit of indexes in a where clause is to reduce the number of pages being read. If a page has, on average, at least one record with a non-NULL value, then there is no gain to using the index. And, if there are 10 records per page, then almost every page will have one of those records.

You might try an index on (pop1, vt, source). The optimizer should pick that one up.

In the end, though, if the where clause is keeping lost of records -- there is not rule but let's say 20% -- then the index probably won't help. One exception would be when the index contains all the columns needed by the query. Then it can satisfy the query without bringing in the data page for each record.

And, if an index gets used and the selectivity is high, then performance with the index could be worse than performance without it.

Tags:

Mysql

Index