How do I reduce the size of a bloated Git repo by non-interactively squashing all commits except for the most recent ones?

The original poster comments:

if we take a snapshot of a commit 10004, remove all commits before it, and make commit 10004 a root commit, I'll be just fine

One way to do this is here, assuming your current work is called branchname. I like to use a temp tag whenever I do a large rebase to double-check that there were no changes and to mark a point I can reset back to if something goes wrong (not sure if this is standard procedure or not but it works for me):

git tag temp

git checkout 10004
git checkout --orphan new_root
git commit -m "set new root 10004"

git rebase --onto new_root 10004 branchname

git diff temp   # verification that it worked with no changes
git tag -d temp
git branch -D new_root

To get rid of the old branch you'll need to delete all tags and branch tags on it; then

git prune
git gc

will clean it from your repo.

Note that you'll temporarily have two copies of everything, until you have gc'd, but that is unavoidable; even if you do a standard squash and rebase you still have two copies of everything until the rebase finishes.


Fastest counting implementation time is almost certainly going to be with grafts and a filter-branch, though you might be able to get faster execution with a handrolled commit-tree sequence working off rev-list output.

Rebase is built to apply changes on different content. What you're doing here is preserving contents and intentionally losing the change history that produced them, so pretty much all of rebase's most tedious and slow work is wasted.

The payload here is, working from your picture,

echo `git rev-parse H; git rev-parse A` > .git/info/grafts  
git filter-branch -- --all

Documentation for git rev-parse and git filter-branch.

Filter-branch is very careful to be recoverable after a failure at any point, which is certainly safest .... but it's only really helpful when recovery by simply redoing it wouldn't be faster and easier if things go south on you. Failures being rare and restarts usually being cheap, the thing to do is to do an un"safe" but very fast operation that is all but certain to work. For that, the best option here is to do it on a tmpfs (the closest equivalent I know on Windows would be a ramdisk like ImDisk), which will be blazing fast and won't touch your main repo until you're sure you've got the results you want.

So on Windows, say T:\wip is on a ramdisk, and note that the clone here copies nothing. As well as reading the docs on git clone's --shared option, do examine the clone's innards to see the real effect, it's very straightforward.

# switch to a lightweight wip clone on a tmpfs
git clone --shared --no-checkout . /t/wip/filterwork
cd !$

# graft out the unwanted commits
echo `git rev-parse $L; git rev-parse $A` >.git/info/grafts
git filter-branch -- --all

# check that the repo history looks right
git log --graph --decorate --oneline --all

# all done with the splicing, filter-branch has integrated it
rm .git/info/grafts

# push the rewritten histories back
git push origin --all --force

There are enough possible variations on what you might be wanting to do and what might be in your repo that almost any of the options on these commands might be useful. The above is tested and will do what it says it does, but that might not be exactly what you want.