Git is really slow for 100,000 objects. Any fixes?

It came down to a couple of items that I can see right now.

  1. git gc --aggressive
  2. Opening up file permissions to 777

There has to be something else going on, but this was the things that clearly made the biggest impact.


git status has to look at every file in the repository every time. You can tell it to stop looking at trees that you aren't working on with

git update-index --assume-unchanged <trees to skip>

source

From the manpage:

When these flags are specified, the object names recorded for the paths are not updated. Instead, these options set and unset the "assume unchanged" bit for the paths. When the "assume unchanged" bit is on, git stops checking the working tree files for possible modifications, so you need to manually unset the bit to tell git when you change the working tree file. This is sometimes helpful when working with a big project on a filesystem that has very slow lstat(2) system call (e.g. cifs).

This option can be also used as a coarse file-level mechanism to ignore uncommitted changes in tracked files (akin to what .gitignore does for untracked files). Git will fail (gracefully) in case it needs to modify this file in the index e.g. when merging in a commit; thus, in case the assumed-untracked file is changed upstream, you will need to handle the situation manually.

Many operations in git depend on your filesystem to have an efficient lstat(2) implementation, so that st_mtime information for working tree files can be cheaply checked to see if the file contents have changed from the version recorded in the index file. Unfortunately, some filesystems have inefficient lstat(2). If your filesystem is one of them, you can set "assume unchanged" bit to paths you have not changed to cause git not to do this check. Note that setting this bit on a path does not mean git will check the contents of the file to see if it has changed — it makes git to omit any checking and assume it has not changed. When you make changes to working tree files, you have to explicitly tell git about it by dropping "assume unchanged" bit, either before or after you modify them.

...

In order to set "assume unchanged" bit, use --assume-unchanged option. To unset, use --no-assume-unchanged.

The command looks at core.ignorestat configuration variable. When this is true, paths updated with git update-index paths… and paths updated with other git commands that update both index and working tree (e.g. git apply --index, git checkout-index -u, and git read-tree -u) are automatically marked as "assume unchanged". Note that "assume unchanged" bit is not set if git update-index --refresh finds the working tree file matches the index (use git update-index --really-refresh if you want to mark them as "assume unchanged").


Now, clearly, this solution is only going to work if there are parts of the repo that you can conveniently ignore. I work on a project of similar size, and there are definitely large trees that I don't need to check on a regular basis. The semantics of git-status make it a generally O(n) problem (n in number of files). You need domain specific optimizations to do better than that.

Note that if you work in a stitching pattern, that is, if you integrate changes from upstream by merge instead of rebase, then this solution becomes less convenient, because a change to an --assume-unchanged object merging in from upstream becomes a merge conflict. You can avoid this problem with a rebasing workflow.


git status should be quicker in Git 2.13 (Q2 2017), because of:

  • an optimization around array of string optimization (see "ways to improve git status performance")
  • a better "read cache" management.

On that last point, see commit a33fc72 (14 Apr 2017) by Jeff Hostetler (jeffhostetler).
(Merged by Junio C Hamano -- gitster -- in commit cdfe138, 24 Apr 2017)

read-cache: force_verify_index_checksum

Teach git to skip verification of the SHA1-1 checksum at the end of the index file in verify_hdr() which is called from read_index() unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command.


Git 2.14 improves again git status performance by better taking into account the "untracked cache", which allows Git to skip reading the untracked directories if their stat data have not changed, using the mtime field of the stat structure.

See the Documentation/technical/index-format.txt for more on untracked cache.

See commit edf3b90 (08 May 2017) by David Turner (dturner-tw).
(Merged by Junio C Hamano -- gitster -- in commit fa0624f, 30 May 2017)

When "git checkout", "git merge", etc. manipulates the in-core index, various pieces of information in the index extensions are discarded from the original state, as it is usually not the case that they are kept up-to-date and in-sync with the operation on the main index.

The untracked cache extension is copied across these operations now, which would speed up "git status" (as long as the cache is properly invalidated).


More generally, writing to the cache will be also quicker with Git 2.14.x/2.15

See commit ce012de, commit b50386c, commit 3921a0b (21 Aug 2017) by Kevin Willford (``).
(Merged by Junio C Hamano -- gitster -- in commit 030faf2, 27 Aug 2017)

We used to spend more than necessary cycles allocating and freeing piece of memory while writing each index entry out.
This has been optimized.

[That] would save anywhere between 3-7% when the index had over a million entries with no performance degradation on small repos.


Update Dec. 2017: Git 2.16 (Q1 2018) will propose an additional enhancement, this time for git log, since the code to iterate over loose object files just got optimized.

See commit 163ee5e (04 Dec 2017) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit 97e1f85, 13 Dec 2017)

sha1_file: use strbuf_add() instead of strbuf_addf()

Replace use of strbuf_addf() with strbuf_add() when enumerating loose objects in for_each_file_in_obj_subdir(). Since we already check the length and hex-values of the string before consuming the path, we can prevent extra computation by using the lower- level method.

One consumer of for_each_file_in_obj_subdir() is the abbreviation code. OID (object identifiers) abbreviations use a cached list of loose objects (per object subdirectory) to make repeated queries fast, but there is significant cache load time when there are many loose objects.

Most repositories do not have many loose objects before repacking, but in the GVFS case (see "Announcing GVFS (Git Virtual File System)") the repos can grow to have millions of loose objects.
Profiling 'git log' performance in Git For Windows on a GVFS-enabled repo with ~2.5 million loose objects revealed 12% of the CPU time was spent in strbuf_addf().

Add a new performance test to p4211-line-log.sh that is more sensitive to this cache-loading.
By limiting to 1000 commits, we more closely resemble user wait time when reading history into a pager.

For a copy of the Linux repo with two ~512 MB packfiles and ~572K loose objects, running 'git log --oneline --parents --raw -1000' had the following performance:

HEAD~1            HEAD
----------------------------------------
7.70(7.15+0.54)   7.44(7.09+0.29) -3.4%

Update March 2018: Git 2.17 will improve git status some more: see this answer.


Update: Git 2.20 (Q4 2018) adds Index Entry Offset Table (IEOT), which allows for git status to load the index faster.

See commit 77ff112, commit 3255089, commit abb4bb8, commit c780b9c, commit 3b1d9e0, commit 371ed0d (10 Oct 2018) by Ben Peart (benpeart).
See commit 252d079 (26 Sep 2018) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit e27bfaa, 19 Oct 2018)

read-cache: load cache entries on worker threads

This patch helps address the CPU cost of loading the index by utilizing the Index Entry Offset Table (IEOT) to divide loading and conversion of the cache entries across multiple threads in parallel.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files reduced the time by 32.24%
Test w/1,000,000 files reduced the time by -4.77%

Note that on the 1,000,000 files case, multi-threading the cache entry parsing does not yield a performance win. This is because the cost to parse the index extensions in this repo, far outweigh the cost of loading the cache entries.

That allows for:

config: add new index.threads config setting

Add support for a new index.threads config setting which will be used to control the threading code in do_read_index().

  • A value of 0 will tell the index code to automatically determine the correct number of threads to use.
    A value of 1 will make the code single threaded.
  • A value greater than 1 will set the maximum number of threads to use.

For testing purposes, this setting can be overwritten by setting the GIT_TEST_INDEX_THREADS=<n> environment variable to a value greater than 0.


Git 2.21 (Q1 2019) introduces a new improvement, with the update of the loose object cache, used to optimize existence look-up, which has been updated.

See commit 8be88db (07 Jan 2019), and commit 4cea1ce, commit d4e19e5, commit 0000d65 (06 Jan 2019) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit eb8638a, 18 Jan 2019)

object-store: use one oid_array per subdirectory for loose cache

The loose objects cache is filled one subdirectory at a time as needed.
It is stored in an oid_array, which has to be resorted after each add operation.
So when querying a wide range of objects, the partially filled array needs to be resorted up to 255 times, which takes over 100 times longer than sorting once.

Use one oid_array for each subdirectory.
This ensures that entries have to only be sorted a single time.
It also avoids eight binary search steps for each cache lookup as a small bonus.

The cache is used for collision checks for the log placeholders %h, %t and %p, and we can see the change speeding them up in a repository with ca. 100 objects per subdirectory:

$ git count-objects
26733 objects, 68808 kilobytes

Test                        HEAD^             HEAD
--------------------------------------------------------------------
4205.1: log with %H         0.51(0.47+0.04)   0.51(0.49+0.02) +0.0%
4205.2: log with %h         0.84(0.82+0.02)   0.60(0.57+0.03) -28.6%
4205.3: log with %T         0.53(0.49+0.04)   0.52(0.48+0.03) -1.9%
4205.4: log with %t         0.84(0.80+0.04)   0.60(0.59+0.01) -28.6%
4205.5: log with %P         0.52(0.48+0.03)   0.51(0.50+0.01) -1.9%
4205.6: log with %p         0.85(0.78+0.06)   0.61(0.56+0.05) -28.2%
4205.7: log with %h-%h-%h   0.96(0.92+0.03)   0.69(0.64+0.04) -28.1%

With Git 2.26 (Q1 2020), the object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.

There however are some cases where they can work together, and they were taught about them.

See commit 20a5fd8 (18 Feb 2020) by Junio C Hamano (gitster).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06 (14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b (13 Feb 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 0df82d9, 02 Mar 2020)

pack-bitmap: implement BLOB_NONE filtering

Signed-off-by: Jeff King

We can easily support BLOB_NONE filters with bitmaps.
Since we know the types of all of the objects, we just need to clear the result bits of any blobs.

Note two subtleties in the implementation (which I also called out in comments):

  • we have to include any blobs that were specifically asked for (and not reached through graph traversal) to match the non-bitmap version
  • we have to handle in-pack and "ext_index" objects separately.
    Arguably prepare_bitmap_walk() could be adding these ext_index objects to the type bitmaps.
    But it doesn't for now, so let's match the rest of the bitmap code here (it probably wouldn't be an efficiency improvement to do so since the cost of extending those bitmaps is about the same as our loop here, but it might make the code a bit simpler).

Here are perf results for the new test on git.git:

Test                                    HEAD^             HEAD
--------------------------------------------------------------------------------
5310.9: rev-list count with blob:none   1.67(1.62+0.05)   0.22(0.21+0.02) -86.8%

To know more aboud oid_array, consider Git 2.27 (Q2 2020)

See commit 0740d0a, commit c79eddf, commit 7383b25, commit ed4b804, commit fe299ec, commit eccce52, commit 600bee4 (30 Mar 2020) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit a768f86, 22 Apr 2020)

oid_array: use size_t for count and allocation

Signed-off-by: Jeff King

The oid_array object uses an "int" to store the number of items and the allocated size.

It's rather unlikely for somebody to have more than 2^31 objects in a repository (the sha1's alone would be 40GB!), but if they do, we'd overflow our alloc variable.

You can reproduce this case with something like:

git init repo
cd repo

# make a pack with 2^24 objects
perl -e '
  my $nr = 2**24;

for (my $i = 0; $i < $nr; $i++) {
 print "blob\n";
 print "data 4\n";
 print pack("N", $i);
}
| git fast-import

# now make 256 copies of it; most of these objects will be duplicates,
# but oid_array doesn't de-dup until all values are read and it can
# sort the result.
cd .git/objects/pack/
pack=$(echo *.pack)
idx=$(echo *.idx)
for i in $(seq 0 255); do
  # no need to waste disk space
  ln "$pack" "pack-extra-$i.pack"
  ln "$idx" "pack-extra-$i.idx"
done

# and now force an oid_array to store all of it
git cat-file --batch-all-objects --batch-check

which results in:

fatal: size_t overflow: 32 * 18446744071562067968

So the good news is that st_mult() sees the problem (the large number is because our int wraps negative, and then that gets cast to a size_t), doing the job it was meant to: bailing in crazy situations rather than causing an undersized buffer.

But we should avoid hitting this case at all, and instead limit ourselves based on what malloc() is willing to give us.
We can easily do that by switching to size_t.

The cat-file process above made it to ~120GB virtual set size before the integer overflow (our internal hash storage is 32-bytes now in preparation for sha256, so we'd expect ~128GB total needed, plus potentially more to copy from one realloc'd block to another)).
After this patch (and about 130GB of RAM+swap), it does eventually read in the whole set. No test for obvious reasons.

Note that this object was defined in sha1-array.c, which has been renamed oid-array.c: a more neutral name, considering Git will be eventually transition from SHA1 to SHA2.


Another optimization:

With Git 2.31 (Q1 2021), the code around the cache-tree extension in the index has been optimized.

See commit a4b6d20, commit 4bdde33, commit 22ad860, commit 845d15d (07 Jan 2021), and commit 0e5c950, commit 4c3e187, commit fa7ca5d, commit c338898, commit da8be8c (04 Jan 2021) by Derrick Stolee (derrickstolee).
See commit 0b72536 (07 Jan 2021) by René Scharfe (rscharfe).
(Merged by Junio C Hamano -- gitster -- in commit a0a2d75, 05 Feb 2021)

cache-tree: speed up consecutive path comparisons

Signed-off-by: Derrick Stolee

The previous change reduced time spent in strlen() while comparing consecutive paths in verify_cache(), but we can do better.
The conditional checks the existence of a directory separator at the correct location, but only after doing a string comparison.
Swap the order to be logically equivalent but perform fewer string comparisons.

To test the effect on performance, I used a repository with over three million paths in the index.
I then ran the following command on repeat:

git -c index.threads=1 commit --amend --allow-empty --no-edit

Here are the measurements over 10 runs after a 5-run warmup:

Benchmark #1: v2.30.0
  Time (mean ± σ):     854.5 ms ±  18.2 ms
  Range (min … max):   825.0 ms … 892.8 ms

Benchmark #2: Previous change
  Time (mean ± σ):     833.2 ms ±  10.3 ms
  Range (min … max):   815.8 ms … 849.7 ms

Benchmark #3: This change
  Time (mean ± σ):     815.5 ms ±  18.1 ms
  Range (min … max):   795.4 ms … 849.5 ms

This change is 2% faster than the previous change and 5% faster than v2.30.0.