mysqldump --single-transaction, yet update queries are waiting for the backup

The --single-transaction option of mysqldump does do a FLUSH TABLES WITH READ LOCK prior to starting the backup job but only under certain conditions. One of those conditions is when you also specify the --master-data option.

In the source code, from mysql-5.6.19/client/mysqldump.c at line 5797:

if ((opt_lock_all_tables || opt_master_data ||
     (opt_single_transaction && flush_logs)) &&
    do_flush_tables_read_lock(mysql))
  goto err;

To get a solid lock on the precise binlog coordinates prior to starting the repeatable-read transaction, the --master-data option triggers this lock to be obtained and then released once the binlog coordinates have been obtained.

In fact, mysqldump does a FLUSH TABLES followed by a FLUSH TABLES WITH READ LOCK because doing both things allows the read lock to be obtained faster in cases where the initial flush takes some time.

...however...

As soon as it has obtained the binlog coordinates, mysqldump issues an UNLOCK TABLES statement, so there shouldn't be anything blocking as a result of the flush you started. Neither should any threads be Waiting for table flush as a result of the transaction that mysqldump is holding.

When you see a thread in the Waiting for table flush state, that should mean that the FLUSH TABLES [WITH READ LOCK] statement was issued and was still running when the query started -- so the query has to wait for the table flush, before it can execute. In the case of the processlist you've posted, mysqldump is reading from this same table, and the query has been running for a while, yet the blocking queries haven't been blocking for all that long.

This all suggests that something else has happened.

There's a long-standing issue explained in Bug #44884 with the way FLUSH TABLES works, internally. I would not be surprised if the issue still persists, I would be surprised if this issue is ever "fixed" because it is a very complex issue to resolve -- virtually impossible to truly fix in a high concurrency environment -- and any attempt at fixing it carries a significant risk of breaking something else, or creating new, different, and still undesirable, behavior.

It seems likely that this will be the explanation for what you're seeing.

Specifically:

  • if you have a long-running query running against a table, and issue FLUSH TABLES, then the FLUSH TABLES will block until the long-running query completes.

  • additionally, any queries that begin after the FLUSH TABLES is issued will block until the FLUSH TABLES is complete.

  • additionally, if you kill the FLUSH TABLES query, the queries that are blocking will still block on the original long-running query, the one that was blocking the FLUSH TABLES query, because even though the killed FLUSH TABLES query didn't finish, that table (the one, or more, involved with the long-running query) is still in the process of being flushed, and that pending flush is going to happen as soon as the long-running query finishes -- but not before.

The likely conclusion here is that another process -- perhaps another mysqldump, or an ill-advised query, or a poorly-written monitoring process tried to flush a table.

That query was subsequently killed or timed out by an unknown mechanism, but its after-effects lingered until mysqldump finished reading from the table in question.

You can replicate this condition by trying to FLUSH TABLES while a long-running query is in process. Then start another query, which will block. Then kill the FLUSH TABLES query, which won't unblock the latest query. Then kill the first query, or let it finish, and the final query will successfully run.


As an afterthought, this is unrelated:

Trx read view will not see trx with id >= 1252538405, sees < 1252538391

That's normal, because mysqldump --single-transaction issues a START TRANSACTION WITH CONSISTENT SNAPSHOT, which prevents it from dumping data that was changed while the dump was in progress. Without that, the binlog coordinates obtained at the start would be meaningless, since the --single-transaction would not be what it claims to be. This should not in any sense be related to the Waiting for table flush issue, as this transaction obviously holds no locks.


The --single-transaction option of mysqldump does not do FLUSH TABLES WITH READ LOCK;. It causes mysqldump to setup a repeatable read transaction for all tables being dumped.

From your question, you stated that the mysqldump's SELECT for the db_external_notification table is holding up hundreds of INSERT command to that same table. Why is this happening ?

The most likely thing a lock on the gen_clust_index (better known as the Clustered Index). This paradigm causes data and index pages for a table to coexist. Those index pages are based on either the PRIMARY KEY or and auto-generated RowID index (in the event there is no PRIMARY KEY).

You should be able to spot this by running SHOW ENGINE INNODB STATUS\G and look for any page from the gen_clust_index that has an exclusive lock. Doing INSERTs into a table with a Clustered Index requires an exclusive lock for handling the PRIMARY KEY's BTREE, as well the serialization of the auto_increment.

I have discussed this phenomenon before

  • Aug 08, 2011 : Are InnoDB Deadlocks exclusive to INSERT/UPDATE/DELETE?
  • Dec 22, 2011 : MySQL deadlock - cannot restart normally?
  • Dec 13, 2012 : MySQL InnoDB locks primary key on delete even in READ COMMITTED

UPDATE 2014-07-21 15:03 EDT

Please look at lines 614-617 of your PastBin

mysql tables in use 1, locked 0
MySQL thread id 6155315, OS thread handle 0x85f11b70, query id 367774810 localhost root Sending data
SELECT /*!40001 SQL_NO_CACHE */ * FROM `db_external_notification`
Trx read view will not see trx with id >= 1252538405, sees < 1252538391

Note that line 617 says

Trx read view will not see trx with id >= 1252538405, sees < 1252538391

What does this tell me? You have some PRIMARY KEY with an auto_increment on id.

Your max id for the table db_external_notification was less than 1252538391 when the mysqldump was launched. When you subtract 1252538391 from 1252538405, this means that 14 or more INSERT commands have been attempted. Internally, this would need to move the auto_increment of this table at least 14 times. Yet, nothing can be committed or even pushed into the Log Buffer because of managing this id gap.

Now, look at the processlist from your PasteBin. Unless I miscounted, I saw 38 DB Connections doing an INSERT (19 Before the mysqldump process (process id 6155315), 19 After). I am sure 14 or more of those connections are frozen because of managing the auto_increment gap.


I submitted a feature request: https://support.oracle.com/epmos/faces/BugDisplay?id=27103902.

I also wrote a patch against 5.6.37 that uses the same method as --single-transaction --master-data combination with --single-transaction --slave-data, which is provided as-is with no warranty. Use at your own risk.

--- mysql-5.6.37/client/mysqldump.c.bak 2017-11-14 12:24:41.846647514 -0600
+++ mysql-5.6.37/client/mysqldump.c 2017-11-14 14:17:51.187050091 -0600
@@ -4900,10 +4900,10 @@
   return 0;
 }

+/*
 static int do_stop_slave_sql(MYSQL *mysql_con)
 {
   MYSQL_RES *slave;
-  /* We need to check if the slave sql is running in the first place */
   if (mysql_query_with_error_report(mysql_con, &slave, "SHOW SLAVE STATUS"))
     return(1);
   else
@@ -4911,23 +4911,21 @@
     MYSQL_ROW row= mysql_fetch_row(slave);
     if (row && row[11])
     {
-      /* if SLAVE SQL is not running, we don't stop it */
       if (!strcmp(row[11],"No"))
       {
         mysql_free_result(slave);
-        /* Silently assume that they don't have the slave running */
         return(0);
       }
     }
   }
   mysql_free_result(slave);

-  /* now, stop slave if running */
   if (mysql_query_with_error_report(mysql_con, 0, "STOP SLAVE SQL_THREAD"))
     return(1);

   return(0);
 }
+*/

 static int add_stop_slave(void)
 {
@@ -5841,10 +5839,12 @@
   if (!path)
     write_header(md_result_file, *argv);

+  /*
   if (opt_slave_data && do_stop_slave_sql(mysql))
     goto err;
+  */

-  if ((opt_lock_all_tables || opt_master_data ||
+  if ((opt_lock_all_tables || opt_master_data || opt_slave_data ||
        (opt_single_transaction && flush_logs)) &&
       do_flush_tables_read_lock(mysql))
     goto err;
@@ -5853,7 +5853,7 @@
     Flush logs before starting transaction since
     this causes implicit commit starting mysql-5.5.
   */
-  if (opt_lock_all_tables || opt_master_data ||
+  if (opt_lock_all_tables || opt_master_data || opt_slave_data ||
       (opt_single_transaction && flush_logs) ||
       opt_delete_master_logs)
   {
 static int add_stop_slave(void)
 {
@@ -5841,10 +5839,12 @@
   if (!path)
     write_header(md_result_file, *argv);

+  /*
   if (opt_slave_data && do_stop_slave_sql(mysql))
     goto err;
+  */

-  if ((opt_lock_all_tables || opt_master_data ||
+  if ((opt_lock_all_tables || opt_master_data || opt_slave_data ||
        (opt_single_transaction && flush_logs)) &&
       do_flush_tables_read_lock(mysql))
     goto err;
@@ -5853,7 +5853,7 @@
     Flush logs before starting transaction since
     this causes implicit commit starting mysql-5.5.
   */
-  if (opt_lock_all_tables || opt_master_data ||
+  if (opt_lock_all_tables || opt_master_data || opt_slave_data ||
       (opt_single_transaction && flush_logs) ||
       opt_delete_master_logs)
   {

I tested it with the following process with slaves to a very busy master using lots of InnoDB tables with FK relationships:

  1. Stop slave A.
  2. Wait ~15 minutes.
  3. Dump DB 1 from slave B with option --single-transaction and --dump-slave=2
  4. Start slave A until coordinates in dump from step 3.
  5. Drop DB 1 and 2 from slave A.
  6. Create empty DB 1 and 2 on slave A.
  7. Load dump from step 3 into slave A.
  8. Dump DB 2 from slave B with the same options. DB 2 has FK relationships to DB 1.
  9. Add replicate_ignore_db for DB 2 and skip_slave_start on slave A.
  10. Restart slave A.
  11. Start slave until coordinates from dump in step 8 on slave A.
  12. Load dump from step 8 into slave A.
  13. Remove replicate_ignore_db and skip_slave_start options from slave A.
  14. Restart slave A.
  15. Wait ~1 week.
  16. Use pt-checksum to verify data integrity.

Oracle's patch submittal process is rather intensive hence why I went this route. I may try with Percona and/or MariaDB to get it integrated.