public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Joshua Redstone <joshua.redstone@fb•com>
To: "Carlos Martín Nieto" <cmn@elego•de>,
	"Tomas Carnecky" <tom@dbservice•com>,
	"Junio C Hamano" <gitster@pobox•com>
Cc: "git@vger•kernel.org" <git@vger•kernel.org>
Subject: Re: Debugging git-commit slowness on a large repo
Date: Wed, 7 Dec 2011 01:48:46 +0000	[thread overview]
Message-ID: <CB04005C.2C669%joshua.redstone@fb.com> (raw)
In-Reply-To: <20111203002347.GB2950@centaur.lab.cmartin.tk>

Hi Carlos and Tomas and Junio,

@Tomas, I tried adding the '--no-status' flag to 'git commit' and it sped
things up by maybe 15%, but commits still take a second.

@Carlos, by "same size", I mean roughly the same number of files and
number of bytes modified in each file.  In all experiments, it's less than
5 files modified per commit with changes totaling fewer than 10 KB, often
more like 1 KB.  I actually wrote a test script to generate commits,
customized for the stats on the repo I'm using.  It repeatedly generates
some changes, does 'git add [ list of files changed ]' followed by 'git
commit --no-status -m [ msg ]'.   It generates changes by picking fewer
than 5 files at random, modifying two 100-byte regions in each file, and
occasionally creates a new file of about 1 KB.  If it helps, I can
probably post the test script I've been using.

I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
iteration, and the time for git-commit jumped from about 1 second to about
8 seconds.  That is a pretty dramatic slowdown.  Any idea why?  I wonder
if that's related to the overall commit slowness.

@Carlos and/or @Junio, can you point me at any docs/code to understand
what a tree-cache is and how it differs from the index?  I did a google
search for [git tree-cache index], but nothing popped out.

Cheers,
Josh


On 12/2/11 4:23 PM, "Carlos Martín Nieto" <cmn@elego•de> wrote:

>On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
>> Hi,
>> I have a git repo with about 300k commits,  150k files totaling maybe
>>7GB.
>>  Locally committing a small change - say touching fewer than 300 bytes
>> across 4 files - consistently takes over one second, which seems kinda
>> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
>> improve after doing a git-gc (my .git dir has maybe 250 files after a
>>git
>> gc).  The same size commit on a brand new repo takes < 10ms.  Any
>>thoughts
>> on why committing a small change seems to take a long time on larger
>>repos?
>
>By "same size commit" do you mean the same amount of changes, or the
>same amount of files? Committing doesn't depend on the size of the
>repo (by itself), but on the size of the index, which depends on the
>amount of files to be committed (as git is snapshot-based). At one
>point, commit forgot how to write the tree cache to the index (a
>performance optimisation). Do the times improve if you run 'git
>read-tree HEAD' between one commit and another? Note that this will
>reset the index to the last commit, though for the tests I image you
>use some variation of 'git commit -a'.
>
>Thomas Rast wrote a patch to re-teach commit to store the tree cache,
>but there were some issues and never got applied.
>
>> 
>> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
>> wrapper), and it was ever slower (about 6 seconds to commit the same
>>small
>> change).
>
>I don't know about the python bindings, but on the (somewhat
>unscientific) tests for libgit2's write-tree (the slow part of a
>creating a commit), it performs slightly faster than git's (though I
>think git's write-tree does update the tree cache, which libgit2
>doesn't currently). The speed could just be a side-effect of the small
>test repo. From your domain, I assume the data is not for public
>consumption, but it'd be great if you could post your code to pygit2's
>issue tracker so we can see how much of the slowdown comes from the
>bindings or the library.
>
>   cmn
>

  parent reply	other threads:[~2011-12-07  1:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
2011-12-03  0:23 ` Carlos Martín Nieto
2011-12-05 17:38   ` Junio C Hamano
2011-12-07  1:48   ` Joshua Redstone [this message]
2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
2011-12-07 22:48       ` Joshua Redstone
2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
2011-12-09  0:09           ` Joshua Redstone
2011-12-09  0:17             ` Joshua Redstone
2011-12-13  0:15               ` Joshua Redstone
2011-12-20  0:51                 ` Joshua Redstone
2011-12-20  1:21                   ` Junio C Hamano
2011-12-20  1:40                     ` Joshua Redstone
2011-12-20  9:23                       ` Thomas Rast
2011-12-20 19:26                         ` Joshua Redstone
2011-12-04 13:54 ` Tomas Carnecky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CB04005C.2C669%joshua.redstone@fb.com \
    --to=joshua.redstone@fb$(echo .)com \
    --cc=cmn@elego$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=tom@dbservice$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox