InnoDB insertion faster

SUGGESTION #1

If your machine has multiple cores, you need to increase the following:

[mysqld]
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_io_capacity = 5000

What are these?

  • innodb_read_io_threads - The number of I/O threads for read operations in InnoDB.
  • innodb_write_io_threads - The number of I/O threads for write operations in InnoDB.
  • innodb_io_capacity - An upper limit on the I/O activity performed by the InnoDB background tasks, such as flushing pages from the buffer pool and merging data from the insert buffer.

SUGGESTION #2

In order for separate data and indexes from the system tablespace (ibdata1), you need to do perform a complete restructuring of InnoDB. Sounds complicated, but is very straightforward. I wrote about this in the DBA StackExchange (Aug 29, 2012) and in StackOverflow (Oct 29, 2010). The basic steps are

  • Run SET GLOBAL innodb_fast_shutdown = 0;
  • mysqldump all data to a SQL dump
  • service mysql stop
  • Delete the following files
    • ibdata1
    • ib_logfile0
    • ib_logfile1
  • service mysql start

Before you run service mysql start, add this line to my.cnf

innodb_open_files=32768

That way, there will be file handles dedicated to the each individual table. The default is 300. File handles have been known to get cached. There will be a slowdown if you set this very high and hit the ceiling quickly. This should not be the case if you are working a small number of tables.


There's an entire document dedicated to bulk loading data into InnoDB. The main points:

  1. disable autocommit to avoid an extra log flush for each insert statement: SET autocommit=0;...sql import;COMMIT;
  2. disable foreign and unique checks (you can't disable all indexes completely):

    SET unique_checks=0;
    SET foreign_key_checks=0;
    
  3. Potentially set innodb_autoinc_lock_mode to 2, instead of 1 (the default). Here is the documentation on this setting.

The third may or may not help you, so I suggest reading that link to see how you are initially loading the data. For instance, if you are breaking the loads into multiple inserts to run concurrently, it will definitely help you to set the value to 2. If you are doing one large multi-line insert, it won't do much (if anything) to help.

Since you are turning of the binary log for this initial insert, you shouldn't care about the gaps in autoincrement numbers (if doing concurrent inserts).