mysql slow count in join query

I downloaded your data and tried a few experiments. I'm running MySQL 5.6.12 on a CentOS virtual machine on a Macbook Pro. The times I observed can be used for comparison, but your system may have different performance.

Base case

First I tried without the USE INDEX clauses, because I avoid optimizer overrides where possible. In most cases, a simple query like this should use the correct index if it's available. Hard-coding the index choice in a query makes it harder to use a better index later.

I also use correlation names (table aliases) to make the query more clear.

mysql> EXPLAIN SELECT COUNT(*) FROM content_en AS e  
INNER JOIN content_page_categories AS c ON c.itemid = e.id 
WHERE c.catid = 1 AND e.active = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: ref
possible_keys: combo,combo2
          key: combo
      key_len: 4
          ref: const
         rows: 71198
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
         type: eq_ref
possible_keys: PRIMARY,combo2,combo
          key: PRIMARY
      key_len: 4
          ref: test.c.itemid
         rows: 1
        Extra: Using where
  • This executed in 0.36 seconds.

Covering index

I'd like to get "Using index" on the second table as well, so I need an index on (active, id) in that order. I had to USE INDEX in this case to persuade the optimizer not to use the primary key.

mysql> ALTER TABLE content_en ADD KEY combo2 (active, id);

mysql> explain SELECT COUNT(*) FROM content_en AS e USE INDEX (combo2) 
INNER JOIN content_page_categories AS c ON c.itemid = e.id 
WHERE c.catid = 1 AND e.active = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: ref
possible_keys: combo,combo2
          key: combo
      key_len: 4
          ref: const
         rows: 71198
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
         type: ref
possible_keys: combo2
          key: combo2
      key_len: 6
          ref: const,test.c.itemid
         rows: 1
        Extra: Using where; Using index

The rows reported by EXPLAIN is an important indicator of how much work it's going to take to execute the query. Notice the rows in the above EXPLAIN is only 71k, much smaller than the 125k rows you got when you scanned the content_en table first.

  • This executed in 0.44 seconds. This is unexpected, because usually a query using a covering index is an improvement.

Convert tables to InnoDB

I tried the same covering index solution as above, but with InnoDB as the storage engine.

mysql> ALTER TABLE content_en ENGINE=InnoDB;
mysql> ALTER TABLE content_page_categories ENGINE=InnoDB;

This had the same EXPLAIN report. It took 1 or 2 iterations to warm the buffer pool, but then the performance of the query tripled.

  • This executed in 0.16 seconds.

  • I also tried removing the USE INDEX, and the time increased slightly, to 0.17 seconds.

@Matthew's solution with STRAIGHT_JOIN

mysql> SELECT straight_join count(*) 
 FROM content_en 
 INNER JOIN content_page_categories use index (combo) 
  ON (id = itemid) 
 WHERE catid = 1 AND active = 1;
  • This executed in 0.20 - 0.22 seconds.

@bobwienholt's solution, denormalization

I tried the solution proposed by @bobwienholt, using denormalization to copy the active attribute to the content_page_categories table.

mysql> ALTER TABLE content_page_categories ADD COLUMN active TINYINT(1);
mysql> UPDATE content_en JOIN content_page_categories ON id = itemid 
    SET content_page_categories.active = content_en.active;
mysql> ALTER TABLE content_page_categories ADD KEY combo3 (catid,active);
mysql> SELECT COUNT(*) FROM content_page_categories WHERE catid = 1 and active = 1;

This executed in 0.037 - 0.044 seconds. So this is better, if you can maintain the redundant active column in sync with the value in the content_en table.

@Quassnoi's solution, summary table

I tried the solution proposed by @Quassnoi, to maintain a table with precomputed counts per catid and active. The table should have very few rows, and looking up the counts you need are primary key lookups and require no JOINs.

mysql> CREATE TABLE page_active_category (
 active INT NOT NULL, 
 catid INT NOT NULL, 
 cnt BIGINT NOT NULL,
 PRIMARY KEY (active, catid) 
) ENGINE=InnoDB;

mysql> INSERT INTO page_active_category
 SELECT  e.active, c.catid, COUNT(*)
 FROM    content_en AS e
 JOIN    content_page_categories AS c ON c.itemid = e.id
 GROUP BY e.active, c.catid

mysql> SELECT cnt FROM page_active_category WHERE active = 1 AND catid = 1

This executed in 0.0007 - 0.0017 seconds. So this is the best solution by an order of magnitude, if you can maintain the table with aggregate counts.

You can see from this that different types of denormalization (including a summary table) is an extremely powerful tool for the sake of performance, though it has drawbacks because maintaining the redundant data can be inconvenient and makes your application more complex.


The problem is the "active" column in content_en. Obviously, if you just needed to know how many content records were related to a particular category (active or not) all you would have to do is:

SELECT count(1)
FROM content_page_categories
WHERE catid = 1;

Having to join back to every content_en record just to read the "active" flag is really what is slowing this query down.

I recommend adding "active" to content_page_categories and making it a copy of the related value in content_en... you can keep this column up to date using triggers or in your code. Then you can change the combo index to be:

KEY combo (catid,active,itemid)

and rewrite your query to:

SELECT count(1)
FROM content_page_categories USE INDEX (combo)
WHERE catid = 1 AND active = 1;

Also, you may have much better luck using InnoDB tables instead of MyISAM. Just be sure to tune your InnoDB settings: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/


There are too many records to count.

If you want a faster solution, you'll have to store aggregate data.

MySQL does not support materialized views (or indexed views in SQL Server's terms) so you would need to create and maintain them yourself.

Create a table:

CREATE TABLE
        page_active_category
        (
        active INT NOT NULL,
        catid INT NOT NULL,
        cnt BIGINT NOT NULL,
        PRIMARY KEY
                (active, catid)
        ) ENGINE=InnoDB;

then populate it:

INSERT
INTO    page_active_category
SELECT  active, catid, COUNT(*)
FROM    content_en
JOIN    content_page_categories
ON      itemid = id
GROUP BY
        active, catid

Now, each time you insert, delete or update a record in either content_en or content_page_categories, you should update the appropriate record in page_active_category.

This is doable with two simple triggers on both content_en and content_page_categories.

This way, your original query may be rewritten as mere:

SELECT  cnt
FROM    page_active_category
WHERE   active = 1
        AND catid = 1

which is a single primary key lookup and hence instant.

Tags:

Mysql

Sql