public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu•org>
To: Jeff King <peff@peff•net>
Cc: Duy Nguyen <pclouds@gmail•com>, Git Mailing List <git@vger•kernel.org>
Subject: Re: [PATCH 4/4] gc --aggressive: three phase repacking
Date: Tue, 18 Mar 2014 07:16:21 +0100	[thread overview]
Message-ID: <87pplkvxoq.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <20140318051342.GA17200@sigill.intra.peff.net> (Jeff King's message of "Tue, 18 Mar 2014 01:13:43 -0400")

Jeff King <peff@peff•net> writes:

> On Tue, Mar 18, 2014 at 12:00:48PM +0700, Duy Nguyen wrote:
>
>> On Tue, Mar 18, 2014 at 11:50 AM, Jeff King <peff@peff•net> wrote:
>> > On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> >
>> >> As explained in the previous commit, current aggressive settings
>> >> --depth=250 --window=250 could slow down repository access
>> >> significantly. Notice that people usually work on recent history only,
>> >> we could keep recent history more loosely packed, so that repo access
>> >> is fast most of the time while the pack file remains small.
>> >
>> > One thing I have not seen is real-world timings showing the slowdown
>> > based on --depth. Did I miss them, or are we just making assumptions
>> > based on one old case from 2009 (that, AFAIK does not have real numbers,
>> > just speculation)? Has anyone measured the effect of bumping the delta
>> > cache size (and its hash implementation)?
>> 
>> David tested it with git-blame [1]. I should probably run some tests
>> too (I don't remember if I tested some operations last time).
>> 
>> http://thread.gmane.org/gmane.comp.version-control.git/242277/focus=242435
>
> Ah, thanks. I do remember that thread now.
>
> It looks like David's last word is that he gets a significant
> performance from bumping the delta base cache size (and number of
> buckets).

Increasing number of buckets was having comparatively minor effects
(that was the suggestion I started with), actually _degrading_
performance rather soon.  The delta base cache size was much more
noticeable.  I had prepared a patch serious increasing it.  The reason
I have not submitted it yet is that I have not found a compelling
real-world test case _apart_ from the fast git-blame that is still
missing implementation of -M and -C options.

There should be other commands digging through large amounts of old
history, but I did not really find something benchmarking convincingly.
Either most stuff is inefficient anyway, or the access order is
better-behaved, causing fewer unwanted cache flushes.

Access order in the optimized git-blame case is basically done with a
reverse commit-time based priority queue leading to a breadth-first
strategy.  It still beats unsorted access solidly in its timing.  Don't
think I compared depth-first results (inversing the priority queue
sorting condition) with regard to cache results, but it's bad for
interactive use as it tends to leave some recent history unblamed for a
long time while digging up stuff in the remote past.

Moderate cache size increases seem like a better strategy, and the
default size of 16M does not make a lot of sense with modern computers.
In particular since the history digging is rarely competing with other
memory intensive operations at the same time.

> And that matches the timings I just did. I suspect there are still
> pathological cases that could behave worse, but it really sounds like
> we should be looking into improving that cache as a first step.

I can put up a patch.  My git-blame experiments used 128M, and the patch
proposes a more conservative 64M.  I don't actually have made
experiments for the 64M setting, though.  The current default is 16M.

-- 
David Kastrup

  reply	other threads:[~2014-03-18  6:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
     [not found]   ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
2014-03-16 23:06     ` Duy Nguyen
2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
2014-03-17 22:12   ` Junio C Hamano
2014-03-17 22:59     ` Duy Nguyen
2014-03-17 23:07       ` Junio C Hamano
2014-03-18  4:50   ` Jeff King
2014-03-18  5:00     ` Duy Nguyen
2014-03-18  5:13       ` Jeff King
2014-03-18  6:16         ` David Kastrup [this message]
2014-03-19 11:03       ` Duy Nguyen
2014-03-18  5:07     ` Jeff King
2014-03-18  5:16       ` Duy Nguyen
2014-03-18  6:19         ` Duy Nguyen
2014-03-18  7:38           ` David Kastrup
     [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
2014-03-20  1:31           ` Duy Nguyen
2014-03-18  6:19       ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pplkvxoq.fsf@fencepost.gnu.org \
    --to=dak@gnu$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=pclouds@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox