MySQL LOAD DATA INFILE slows by 80% after a few gigs of input with InnoDB engine

OBSERVATION #1

I noticed you turned off autocommit. That will pile up so much data in ibdata1. Why?

There are seven(7) classes of information that is stored in ibdata1:

  • Data Pages for InnoDB Tables
  • Index Pages for InnoDB Tables
  • Data Dictionary
  • Double Write Buffer
    • Safety Net to Prevent Data Corruption
    • Helps Bypass OS for Caching
  • Insert Buffer (Streamlines Changes to Secondary Indexes)
  • Rollback Segments
  • Undo Logs
  • Click Here to see a Pictorial Representation of ibdata1

Some of this info is made visible to certain transactions depending on the isolation level. Such actions could produce unintended primary key locks and lots of phantom data. As these two things increase, you should expect a fair slow down.

Recommendation: Leave autocommit on

OBSERVATION #2

I see you have this:

alter table item_load disable keys;

DISABLE KEYS does not work with InnoDB. Here is why:

  • MyISAM : DISABLE KEYS simply shuts off Secondary Index updating for the MyISAM table. When you mass INSERT into a MyISAM table with keys disabled results in a fast table load along with a building of the PRIMARY KEY and all unique indexes. When you run ENABLE KEYS, all Secondary Indexes are built linearly on the table and appended to the .MYD.
  • InnoDB : As shown in the internals picture of InnoDB, the system tablespave ibdata1 has a structure dedicated to Secondary Index Insertions. At present, there is no provision to handle indexes the same as MyISAM.

To illustrate this, note my attempt to run DISABLE KEYS on an InnoDB table in MySQL

mysql> show create table webform\G
*************************** 1. row ***************************
       Table: webform
Create Table: CREATE TABLE `webform` (
  `nid` int(10) unsigned NOT NULL,
  `confirmation` text NOT NULL,
  `confirmation_format` tinyint(4) NOT NULL DEFAULT '0',
  `redirect_url` varchar(255) DEFAULT '<confirmation>',
  `status` tinyint(4) NOT NULL DEFAULT '1',
  `block` tinyint(4) NOT NULL DEFAULT '0',
  `teaser` tinyint(4) NOT NULL DEFAULT '0',
  `allow_draft` tinyint(4) NOT NULL DEFAULT '0',
  `submit_notice` tinyint(4) NOT NULL DEFAULT '1',
  `submit_text` varchar(255) DEFAULT NULL,
  `submit_limit` tinyint(4) NOT NULL DEFAULT '-1',
  `submit_interval` int(11) NOT NULL DEFAULT '-1',
  PRIMARY KEY (`nid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> alter table webform disable keys;
Query OK, 0 rows affected, 1 warning (0.00 sec)

mysql> show warnings;
+-------+------+-------------------------------------------------------------+
| Level | Code | Message                                                     |
+-------+------+-------------------------------------------------------------+
| Note  | 1031 | Table storage engine for 'webform' doesn't have this option |
+-------+------+-------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.27-log |
+------------+
1 row in set (0.00 sec)

mysql>

OBSERVATION #3

You noticed that MyISAM loads 20x faster than InnoDB. Would you like that to be more like 24-25 times faster? Then run the following:

ALTER TABLE item_load ROW_FORMAT=Fixed;

This will speed up INSERTs times 20-25% without any other DDL Changes. Side effect: The MyISAM table can grow 80%-100% in size, possibly larger.

You could run this on an InnoDB table as well, but the ACID-compliant behavior and MVCC of InnoDB would still be the bottleneck of its performance, especially if VARCHAR fields increase significantly are are written to ibdata1.


The final answer to this question was to not use InnoDB for a massive reference table. MyISAM is screaming fast, near full throughput of the disk speed for the entire load, InnoDB bogs down. MyISAM is simple, but in this case so are the requirements of this table. For a simple reference table with bulk loads over LOAD DATA INFILE, MyISAM is the way to go, so far so good.

But note that if you run MyISAM and InnoDB tables both you're going to need to consider memory allocation for 2 caching mechanisms, each engine has it's own unique caching that needs separate memory allocation.


You could try to split your input files into smaller chunks.

I personally use http://www.percona.com/doc/percona-toolkit/2.1/pt-fifo-split.html for this.

What happens if you get a table lock for table during the import? Maybe the rowlevel locking of InnoDB slows it down (MyISAM uses a table lock).

You could also read here for further ideas: http://derwiki.tumblr.com/post/24490758395/loading-half-a-billion-rows-into-mysql