Stack (Haskell) build cache of source files with GitHub Actions

The culprit for this problem is that stack uses timestamp (as many other tools do) to figure out if a source file has changed or not. When you restore cache on CI and you do it correctly, none of the dependencies will get rebuild, but the problem the source files is that when the CI provider clones a repo for you, the timestamps for all of the files in the repo are set to the date and time when it was cloned.

Hopefully the cause for recompilation of unchanged source files makes sense now. What do we do about working around this problem. The only real way to get it is to restore the timestamp of the last git commit that changed a particular file. I noticed this quite a while ago and a bit of googling gave me some answers on SO, here is one of them I think: Restore a file's modification time in Git

A modified it a bit to suite my needs and that is what I ended up with:

  git ls-tree -r --name-only HEAD | while read filename; do
    TS="$(git log -1 --format="%ct" -- ${filename})"
    touch "${filename}" -mt "$(date --date="@$TS" "+%Y%m%d%H%M.%S")"
  done

That worker great for a while for me on Ubuntu CI, but solving this problem in an OS agnostic manner with bash is not something I wanted to do when I needed to setup Azure CI. For that reason I wrote a Haskell script that works for all GHC-8.2 version and newer without requiring any non-core dependencies. I use it for all of my projects and I'll embed the juice of it here, but also provide a link to a permanent gist:

main = do
  args <- getArgs
  let rev = case args of
        [] -> "HEAD"
        (x:_) -> x
  fs <- readProcess "git" ["ls-tree", "-r", "-t", "--full-name", "--name-only", rev] ""
  let iso8601 = iso8601DateFormat (Just "%H:%M:%S%z")
      restoreFileModtime fp = do
        modTimeStr <- readProcess "git" ["log", "--pretty=format:%cI", "-1", rev, "--", fp] ""
        modTime <- parseTimeM True defaultTimeLocale iso8601 modTimeStr
        setModificationTime fp modTime
        putStrLn $ "[" ++ modTimeStr ++ "] " ++ fp
  putStrLn "Restoring modification time for all these files:"
  mapM_ restoreFileModtime $ lines fs

How would you go about using it without much overhead. The trick is to:

  • use stack itself to run the script
  • use the exactly samel resolver as the one for the project.

Above two points will ensure that no redundant dependencies or ghc versions will get installed. All in all the only two things are needed are stack and something like curl or wget and it will work cross platform:

# Script for restoring source files modification time from commit to avoid recompilation.
curl -sSkL https://gist.githubusercontent.com/lehins/fd36a8cc8bf853173437b17f6b6426ad/raw/4702d0252731ad8b21317375e917124c590819ce/git-modtime.hs -o git-modtime.hs
# Restore mod time and setup ghc, if it wasn't restored from cache
stack script --resolver ${RESOLVER} git-modtime.hs --package base --package time --package directory --package process

Here is a real project that uses this approach and you can dig through it to see how it works: massiv-io

Edit @Simon Michael in the comments mentioned that he can't reproduce this issue locally. Reason for this is that not everything is the same up on CI as it is locally. Quite often an absolute path is different, for example, possibly other things that I can't think of right now. Those things, together with the source file timestamp cause the recompilation of the source files.

For example follow this steps and you will find your project will be recompiled:

~/tmp$ git clone [email protected]:fpco/safe-decimal.git
~/tmp$ cd safe-decimal
~/tmp/safe-decimal$ stack build
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...
Configuring safe-decimal-0.2.0.0...
safe-decimal> build (lib)
Preprocessing library for safe-decimal-0.2.0.0..
Building library for safe-decimal-0.2.0.0..
[1 of 3] Compiling Numeric.Decimal.BoundedArithmetic
[2 of 3] Compiling Numeric.Decimal.Internal
[3 of 3] Compiling Numeric.Decimal
...
~/tmp/safe-decimal$ cd ../
~/tmp$ mv safe-decimal safe-decimal-moved
~/tmp$ cd safe-decimal-moved/
~/tmp/safe-decimal-moved$ stack build
safe-decimal-0.2.0.0: unregistering (old configure information not found)
safe-decimal> configure (lib)
[1 of 2] Compiling Main
...

You'll see that the location of the project triggered project building. Despite that the project itself was rebuild, you will notice that none of the source files were recompiled. Now if you combine that procedure with a touch of a source file, that source file will get recompiled.

To sum it up:

  • Environment can cause the project to be rebuild
  • Contents of a source file can cause the source file (and others that depend on it) to be recompiled
  • Environment together with the source file contents or timestamp change can cause the project together with that source file to be recompiled