What does git lfs migrate do?

The strange files that looks like git lfs pointer and replaced the ones in your working copy (in your case the files in test-data/) appears if you migrate

  • after a git lfs track command. This command change all the tracked "big file" on your working copy.
  • without having commited all the change made to your working copy by the git lfs track command. This may happen if you followed the tutorial and just commited .gitattributes, but not all your "big files".

The only problem is that the original git-objects of the binary files are still in the .git folder because you didn't garbage-collected them.

You should follow the git lfs migration tutorial which explains:

The above successfully converts pre-existing git objects to lfs objects. However, the regular objects still persist in the .git directory. These will be cleaned up eventually by git, but to clean them up right away, run:

git reflog expire --expire-unreachable=now --all
git gc --prune=now

After running that your .git should be the same size, but if you'll go into it you should see that objects should be now much smaller than before the migrations and that lfs holds the rest.

The even better news is that now when other developers/applications clone the repo they will only have to download the objects directory and will then fetch only the "large-files" which they check out, not the whole history.


I thought that git lfs migrate rewrote the history of a repo so that specified large files were kept in LFS.

Perfectly true.

This means that the repo should get smaller, because it doesn't directly contain all versions of large files.

Not exactly true. The promise of git lfs is not that your repo will be smaller but that when you clone, you won't have to download all the git objects so the clone will be smaller and faster. Because for the file managed by git-lfs, only the files that should appear in your working directory will be downloaded during the git checkout.

All of the files in the test-data/ directory are replaced with files that look like this:

That's how git-lfs works. Instead of committing the file in the repository, it commit a this "pointer" file that contains the id of the object. The content of the file is stored in the .git/lfs/objects folder. And these objects will be uploaded to the server when you will git push.

And the .git folder becomes twice as large (400MB to 800MB). I am confused.

Because all the files managed by git lfs are stored in this folder it could become huge. I also think it double the size of your repository because the objects are stored twice for the moment. In the .git/objects until you ditch the old history (by purging the reflog and doing a git gc. But do that once you are sure your lfs migration is a success) and in .git/lfs/objects because you made the git lfs conversion.

I think (but I'm not sure) that .git/lfs/objects serve as a cache folder so once you pushed all the new history and so it uploaded the files managed by lfs, you could delete it to reduce the size of your repository. But if I were you, I will not do that!

To see the real effect of git lfs on your local repository, once you --force pushed the new history (and that the old one is no more in the remote repository), I will do a fresh clone. And now, your local repository should be smaller.

But the folder .git/lfs/objects will still grow in the future every time a new version of these files is downloaded (but it should always stay smaller than if you didn't use git lfs).

I hope you better understand how it works...

PS:

All of the files in the test-data/ directory are replaced with files that look like this:

I hope that what you said is partly false. That your files in test-data/ still contains the good content but what you report is what a git command show you... Could you confirm? Or you have a problem... That could be explained by not having git lfsinstalled.

Tags:

Git

Git Lfs