Size of a Git repository vs Time

The size of a commit is very hard to define. First of all, most commits recycle a lot of existing Git objects. If you don’t change a file between revision A and B, should the size of B include the size of that file? Also, the repository size itself is not that easily determined either. Due to Git’s compression system, it will repack objects from time to time. The way it does that can be influenced by multiple things, so it might not pack the same way if you do it again, resulting in a different total size.

What you could do is check the size of the checked-out tree of every revision. But of course the result you will get there will be far away from the repository’s size itself.


Instead of trying to measure "size" (which does not make sense with a Git repo, as explained by poke), you could visualize "code frequency" (ie the "size" of contributions in term of lines added or removed over time):
The idea comes from "Introducing the New GitHub Graphs"

https://github-images.s3.amazonaws.com/blog/2012/graphs.code-frequency.png

See "Stupid Git Trick - getting contributor stats", except you wouldn't necessarily use the --author with git log --mumstat, but you can combine git log with --since and --until option.

Something like:

git log --since "OCT 4 2011" --until "OCT 11 2011" --pretty=tformat: --numstat | \
  gawk '{ add += $1 ; subs += $2 ; loc += $1 - $2 } END \
  { printf "added lines: %s removed lines: %s total lines: %s\n",add,subs,loc }' -

Git doesn't provide such feature yet.

The best solution would be to iretate over the log and grep the filesize and add'em together.

There is an solition written in PERL by one of the makers of BitBucket (Daniel Rohan):

https://confluence.atlassian.com/plugins/servlet/mobile#content/view/292651328

Tags:

Git