public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu•org>
To: Duy Nguyen <pclouds@gmail•com>
Cc: Git Mailing List <git@vger•kernel.org>
Subject: Re: [PATCH] Bump core.deltaBaseCacheLimit to 96m
Date: Mon, 05 May 2014 13:20:09 +0200	[thread overview]
Message-ID: <874n14tqty.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <CACsJy8BG8fRPk74R_-YABCGMn-YwbDcLHtjUNX7KE66jX1mR4A@mail.gmail.com> (Duy Nguyen's message of "Mon, 5 May 2014 17:26:56 +0700")

Duy Nguyen <pclouds@gmail•com> writes:

> On Mon, May 5, 2014 at 12:13 AM, David Kastrup <dak@gnu•org> wrote:
>> The default of 16m causes serious thrashing for large delta chains
>> combined with large files.
>>
>> Here are some benchmarks (pu variant of git blame):
>>
>> time git blame -C src/xdisp.c >/dev/null
>
> ...
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 1932e9b..21a3c86 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -489,7 +489,7 @@ core.deltaBaseCacheLimit::
>>         to avoid unpacking and decompressing frequently used base
>>         objects multiple times.
>>  +
>> -Default is 16 MiB on all platforms.  This should be reasonable
>> +Default is 96 MiB on all platforms.  This should be reasonable
>>  for all users/operating systems, except on the largest projects.
>>  You probably do not need to adjust this value.
>
> So emacs.git falls exactly into the "except on the largest projects"
> part.

git gc --aggressive has been used/recommended for _all_ projects
regularly, leading to delta chains with a length of 250.  So this delta
chain size is not exceptional but will eventually occur in any archive
that has been created and maintained according to the recommendations of
Git's documentation (which recommends gc --aggressive every few hundreds
of revisions).  I was illustrating the effect on a file of size 1MB.
That's not an egregiously large file either.

96MB is the point of diminuishing returns for this case which is _6_
times larger than the current default and _small_ in comparison with the
memory installed on developer machines nowadays.  Similar slowdowns
occur with other examples.  Git will with the current defaults accept
files of 512Mb size into its compression scheme (and thus its core
memory) before punting.

The current delteBaseCacheLimit of 16Mb is rather ridiculous in
particular with the pre-2.0 settings for gc --aggressive and causes
serious performance degration.  It was actually ridiculous even 10 years
ago.

> Would it make more sense to advise git devs to set this per repo
> instead? The majority of (open source) repositories out there are
> small if I'm not mistaken. Of those few big repos, we could have a
> section listing all the tips and tricks to tune git. This is one of
> them. Index v4 and sparse checkout are some other. In future, maybe
> watchman support, split index and untracked cache as well.

Shrug.  The last version of the patch was refused because of wanting
more evidence.  I added the evidence.

And I have it on record in the mailing list and can point to it when
people ask me why Git is so slow for "git blame" in comparison to other
version control systems in spite of my purporting to having improved it.

I'm definitely not going to jump through any more hoops here.  I don't
see a point in this kind of spectacle.

-- 
David Kastrup

  parent reply	other threads:[~2014-05-06 16:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-04 17:13 [PATCH] Bump core.deltaBaseCacheLimit to 96m David Kastrup
2014-05-05 10:26 ` Duy Nguyen
2014-05-05 10:27   ` Duy Nguyen
2014-05-05 11:03   ` Matthieu Moy
2014-05-05 11:35     ` Duy Nguyen
2014-05-05 11:20   ` David Kastrup [this message]
2014-05-05 20:19     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874n14tqty.fsf@fencepost.gnu.org \
    --to=dak@gnu$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=pclouds@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox