The mystery of stuck inactive msbuild.exe processes, locked Stylecop.dll, Nuget AccessViolationException and CI builds clashing with each other

After a lot of digging around and trying various things to no effect, I eventually ended up creating a new minimal solution which reproduced the issue with very little else going on. The issue turned out to be caused by msbuild's multi-core parallelisation - the 'm' parameter.

  • The 'm' parameter tells msbuild to spawn "nodes", these will remain alive after the build has ended, and are then re-used by new builds!
  • The StyleCop 'ViolationCount' error was caused by a given build re-using an old version of the stylecop.dll from another build's workspace, where ViolationCount was not supported. This was odd, because the CI workspace only contained the new version. It seems that once the StyleCop.dll was loaded into a given MsBuild node, it would remain loaded for the next build. I can only assume this is because StyleCop loads some sort of singleton into the nodes processs? This also explains the file-locking between builds.
  • The nuget access violation crash has now gone (with no other changes), so is evidently related to the above node re-use issue.
  • As the 'm' parameter defaults to the number of cores - we were seeing 24 msbuild instances created on our build server for a given job.

The following posts were helpful:

  • msbuild.exe staying open, locking files
  • http://www.hanselman.com/blog/FasterBuildsWithMSBuildUsingParallelBuildsAndMulticoreCPUs.aspx
  • http://stylecop.codeplex.com/discussions/394606
  • https://github.com/Glimpse/Glimpse/issues/115
  • http://msdn.microsoft.com/en-us/library/vstudio/ms164311.aspx

The fix:

  • Add the line set MSBUILDDISABLENODEREUSE=1 to the batch file which launches msbuild
  • Launch msbuild with /m:4 /nr:false
  • The 'nr' paremeter tells msbuild to not use "Node Reuse" - so msbuild instances are closed after the build is completed and no longer clash with each other - resulting in the above errors.
  • The 'm' parameter is set to 4 to stop too many nodes spawning per-job