MySQL replication slave hangs after encountering SET @@SESSION.GTID_NEXT= 'ANONYMOUS';

TL;DR: This is probably caused by poor table design combined with ROW-based replication.

I just ran into this problem. I was asked to move an old database to a new server and set up replication.

I found that it's not actually the statement in the subject that causes the slave to hang (SET @@SESSION.GTID_NEXT= 'ANONYMOUS'). This statement is issued at the beginning of a transaction.

mysql> SHOW BINLOG EVENTS IN 'mysql-bin.000196' FROM 96754384 LIMIT 5000;
+------------------+-----------+----------------+-----------+-------------+----------------------------------------------+
| Log_name         | Pos       | Event_type     | Server_id | End_log_pos | Info                                         |
+------------------+-----------+----------------+-----------+-------------+----------------------------------------------+
| mysql-bin.000196 |  96754384 | Anonymous_Gtid |         1 |    96754449 | SET @@SESSION.GTID_NEXT= 'ANONYMOUS'         |
| mysql-bin.000196 |  96754449 | Query          |         1 |    96754537 | BEGIN                                        |
| mysql-bin.000196 |  96754537 | Table_map      |         1 |    96754608 | table_id: 241 (db.bad_table)                 |
| mysql-bin.000196 |  96754608 | Delete_rows    |         1 |    96762805 | table_id: 241                                |
| mysql-bin.000196 |  96762805 | Delete_rows    |         1 |    96771002 | table_id: 241                                |
...
| mysql-bin.000196 | 106681175 | Delete_rows    |         1 |   106689372 | table_id: 241                                |
| mysql-bin.000196 | 106689372 | Delete_rows    |         1 |   106697569 | table_id: 241                                |
| mysql-bin.000196 | 106697569 | Delete_rows    |         1 |   106697626 | table_id: 241 flags: STMT_END_F              |
| mysql-bin.000196 | 106697626 | Xid            |         1 |   106697657 | COMMIT /* xid=28382600 */                    |
| mysql-bin.000196 | 106697657 | Rotate         |         1 |   106697704 | mysql-bin.000197;pos=4                       |
+------------------+-----------+----------------+-----------+-------------+----------------------------------------------+
1219 rows in set (0.02 sec)

This table has 66 million rows. I found that it has no primary key or unique key. The query responsible for this uses an index scan on the master.

mysql> SHOW VARIABLES LIKE '%binlog_format%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| binlog_format | ROW   |
+---------------+-------+
1 row in set (0.00 sec)

For the slave to replicate this with ROW-based replication, it needs to perform approximately 1200 full table scans on the slave. 1200 is probably a fairly small number here. It could be in the hundreds of thousands. The replication does actually work, but with this design, 'seconds_behind_master' will grow indefinitely.

I will add a primary key and partitioning to this table. I will also ask my colleagues to rewrite their code so bulk deletes are no longer necessary. This probably requires adding an additional column.

EDIT: I don't have enough points to comment on other posts, so I will add my comments here for now. I believe that issuing 'SET GLOBAL sql_slave_skip_counter = 1', as mentioned by others, will skip the entire transaction and lead to data inconsistencies. Correct me if I'm wrong.

A quick fix would be to change the binlog format to QUERY or MIXED. These formats can also lead to data inconsistencies, so I would recommend finding and fixing the root cause instead of changing the binlog format.