Insert performance increases under load: Why?

One possible reason is that four concurrent processes generate a more favourable pattern of log flushes - typically meaning that each log flush writes more data than is the case with a single executing process.

To determine if transaction log throughput/flush size is a factor, monitor:

  • sys.dm_os_wait_stats for WRITELOG and LOGBUFFER waits
  • sys.dm_io_pending_io_requests for IO performance
  • Performance Monitor counters (or sys.dm_os_performance_counters) for:
    • Log Bytes Flushed/sec
    • Log Flushes/sec
    • Log Flush Wait Time

Look for internal limits being reached. In SQL Server 2008 R2, there can be a maximum of 32 outstanding (asynchronous) log flush I/Os per database on 64-bit versions (only 8 on 32-bit). There is also a total size limit on the outstanding IOs of 3840KB.

More information and further reading:

  • Transaction Log Monitoring
  • Trimming Transaction Log Fat
  • Diagnosing Transaction Log Performance Issues and Limits of the Log Manager
  • Optimizing Transaction Log Throughout

Everything @PaulWhite says, plus...

If you have foreign keys in place, every insert will require a check to be done on each table referenced. It sounds to me like you are, as you're only getting 360ms, which feels slow to me.

Anyway, checking those tables is massively helped by having that data in RAM already, rather than having to load it into disk.

It sounds to me like loading the data into RAM is a significant part of your execution, and that it only needs to happen once.

It could also be effective plan caching, and that your queries need to be compiled the first time, with subsequent calls being able to avoid that phase.