From: Joshua Redstone <joshua.redstone@fb•com>
To: "Carlos Martín Nieto" <cmn@elego•de>,
"Tomas Carnecky" <tom@dbservice•com>,
"Junio C Hamano" <gitster@pobox•com>
Cc: "git@vger•kernel.org" <git@vger•kernel.org>
Subject: Re: Debugging git-commit slowness on a large repo
Date: Wed, 7 Dec 2011 01:48:46 +0000 [thread overview]
Message-ID: <CB04005C.2C669%joshua.redstone@fb.com> (raw)
In-Reply-To: <20111203002347.GB2950@centaur.lab.cmartin.tk>
Hi Carlos and Tomas and Junio,
@Tomas, I tried adding the '--no-status' flag to 'git commit' and it sped
things up by maybe 15%, but commits still take a second.
@Carlos, by "same size", I mean roughly the same number of files and
number of bytes modified in each file. In all experiments, it's less than
5 files modified per commit with changes totaling fewer than 10 KB, often
more like 1 KB. I actually wrote a test script to generate commits,
customized for the stats on the repo I'm using. It repeatedly generates
some changes, does 'git add [ list of files changed ]' followed by 'git
commit --no-status -m [ msg ]'. It generates changes by picking fewer
than 5 files at random, modifying two 100-byte regions in each file, and
occasionally creates a new file of about 1 KB. If it helps, I can
probably post the test script I've been using.
I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
iteration, and the time for git-commit jumped from about 1 second to about
8 seconds. That is a pretty dramatic slowdown. Any idea why? I wonder
if that's related to the overall commit slowness.
@Carlos and/or @Junio, can you point me at any docs/code to understand
what a tree-cache is and how it differs from the index? I did a google
search for [git tree-cache index], but nothing popped out.
Cheers,
Josh
On 12/2/11 4:23 PM, "Carlos Martín Nieto" <cmn@elego•de> wrote:
>On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
>> Hi,
>> I have a git repo with about 300k commits, 150k files totaling maybe
>>7GB.
>> Locally committing a small change - say touching fewer than 300 bytes
>> across 4 files - consistently takes over one second, which seems kinda
>> slow. This is using git 1.7.7.4 on a linux 2.6 box. The time does not
>> improve after doing a git-gc (my .git dir has maybe 250 files after a
>>git
>> gc). The same size commit on a brand new repo takes < 10ms. Any
>>thoughts
>> on why committing a small change seems to take a long time on larger
>>repos?
>
>By "same size commit" do you mean the same amount of changes, or the
>same amount of files? Committing doesn't depend on the size of the
>repo (by itself), but on the size of the index, which depends on the
>amount of files to be committed (as git is snapshot-based). At one
>point, commit forgot how to write the tree cache to the index (a
>performance optimisation). Do the times improve if you run 'git
>read-tree HEAD' between one commit and another? Note that this will
>reset the index to the last commit, though for the tests I image you
>use some variation of 'git commit -a'.
>
>Thomas Rast wrote a patch to re-teach commit to store the tree cache,
>but there were some issues and never got applied.
>
>>
>> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
>> wrapper), and it was ever slower (about 6 seconds to commit the same
>>small
>> change).
>
>I don't know about the python bindings, but on the (somewhat
>unscientific) tests for libgit2's write-tree (the slow part of a
>creating a commit), it performs slightly faster than git's (though I
>think git's write-tree does update the tree cache, which libgit2
>doesn't currently). The speed could just be a side-effect of the small
>test repo. From your domain, I assume the data is not for public
>consumption, but it'd be great if you could post your code to pygit2's
>issue tracker so we can see how much of the slowdown comes from the
>bindings or the library.
>
> cmn
>
next prev parent reply other threads:[~2011-12-07 1:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
2011-12-03 0:23 ` Carlos Martín Nieto
2011-12-05 17:38 ` Junio C Hamano
2011-12-07 1:48 ` Joshua Redstone [this message]
2011-12-07 2:08 ` Nguyen Thai Ngoc Duy
2011-12-07 22:48 ` Joshua Redstone
2011-12-08 1:39 ` Nguyen Thai Ngoc Duy
2011-12-09 0:09 ` Joshua Redstone
2011-12-09 0:17 ` Joshua Redstone
2011-12-13 0:15 ` Joshua Redstone
2011-12-20 0:51 ` Joshua Redstone
2011-12-20 1:21 ` Junio C Hamano
2011-12-20 1:40 ` Joshua Redstone
2011-12-20 9:23 ` Thomas Rast
2011-12-20 19:26 ` Joshua Redstone
2011-12-04 13:54 ` Tomas Carnecky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CB04005C.2C669%joshua.redstone@fb.com \
--to=joshua.redstone@fb$(echo .)com \
--cc=cmn@elego$(echo .)de \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=tom@dbservice$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox