public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu•org>
To: Jeff King <peff@peff•net>
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail•com>, git@vger•kernel.org
Subject: Re: [PATCH 4/4] gc --aggressive: three phase repacking
Date: Tue, 18 Mar 2014 07:19:43 +0100	[thread overview]
Message-ID: <87lhw8vxj4.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <20140318050727.GA14769@sigill.intra.peff.net> (Jeff King's message of "Tue, 18 Mar 2014 01:07:27 -0400")

Jeff King <peff@peff•net> writes:

> On Tue, Mar 18, 2014 at 12:50:50AM -0400, Jeff King wrote:
>
>> On Sun, Mar 16, 2014 at 08:35:04PM +0700, Nguyễn Thái Ngọc Duy wrote:
>> 
>> > As explained in the previous commit, current aggressive settings
>> > --depth=250 --window=250 could slow down repository access
>> > significantly. Notice that people usually work on recent history only,
>> > we could keep recent history more loosely packed, so that repo access
>> > is fast most of the time while the pack file remains small.
>> 
>> One thing I have not seen is real-world timings showing the slowdown
>> based on --depth. Did I miss them, or are we just making assumptions
>> based on one old case from 2009 (that, AFAIK does not have real numbers,
>> just speculation)? Has anyone measured the effect of bumping the delta
>> cache size (and its hash implementation)?
>
> Just as a very quick, rough data point, here are before-and-after
> timings for the patch below doing "git rev-list --objects --all" on my
> linux.git, which is a mix of "--aggressive" and normal packing (I didn't
> do a "repack -f", but it's partially what I've downloaded from k.org and
> what I've repacked in various experiments over the past few months).
>
>   [before]
>   real    0m28.824s
>   user    0m28.620s
>   sys     0m0.232s
>
>   [after]
>   real    0m21.694s
>   user    0m21.544s
>   sys     0m0.172s
>
> The numbers below are completely pulled out of a hat, so we can perhaps
> do even better. But I think it shows that there is room for improvement
> in the delta base cache.
>
> ---
> diff --git a/environment.c b/environment.c
> index c3c8606..73ed670 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -37,7 +37,7 @@ int core_compression_seen;
>  int fsync_object_files;
>  size_t packed_git_window_size = DEFAULT_PACKED_GIT_WINDOW_SIZE;
>  size_t packed_git_limit = DEFAULT_PACKED_GIT_LIMIT;
> -size_t delta_base_cache_limit = 16 * 1024 * 1024;
> +size_t delta_base_cache_limit = 128 * 1024 * 1024;

You need to change a file in Documentation as well.  Can offer a patch.

>  unsigned long big_file_threshold = 512 * 1024 * 1024;
>  const char *pager_program;
>  int pager_use_color = 1;
> diff --git a/sha1_file.c b/sha1_file.c
> index b37c6f6..a9ab8e3 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1944,7 +1944,7 @@ static void *unpack_compressed_entry(struct packed_git *p,
>  	return buffer;
>  }
>  
> -#define MAX_DELTA_CACHE (256)
> +#define MAX_DELTA_CACHE (1024)

This one really needs experimentation.  I found that increases here lead
to performance degradation rather soon, probably because of decreased
memory locality without significant reduction in cache collisions.  Not
sure whether it's worth touching at all.

-- 
David Kastrup

      parent reply	other threads:[~2014-03-18  6:20 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-16 13:34 [PATCH 0/4] Better "gc --aggressive" Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 1/4] environment.c: fix constness for odb_pack_keep() Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH] index-pack: do not segfault when keep_name is NULL Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 2/4] pack-objects: support --keep Nguyễn Thái Ngọc Duy
2014-03-16 13:35 ` [PATCH 3/4] gc --aggressive: make --depth configurable Nguyễn Thái Ngọc Duy
     [not found]   ` <CAG+J_Dw=Y5d2JTOngkxH=vNg3C43nP5=y7S6VXS=aHgmBshYZQ@mail.gmail.com>
2014-03-16 23:06     ` Duy Nguyen
2014-03-16 13:35 ` [PATCH 4/4] gc --aggressive: three phase repacking Nguyễn Thái Ngọc Duy
2014-03-17 22:12   ` Junio C Hamano
2014-03-17 22:59     ` Duy Nguyen
2014-03-17 23:07       ` Junio C Hamano
2014-03-18  4:50   ` Jeff King
2014-03-18  5:00     ` Duy Nguyen
2014-03-18  5:13       ` Jeff King
2014-03-18  6:16         ` David Kastrup
2014-03-19 11:03       ` Duy Nguyen
2014-03-18  5:07     ` Jeff King
2014-03-18  5:16       ` Duy Nguyen
2014-03-18  6:19         ` Duy Nguyen
2014-03-18  7:38           ` David Kastrup
     [not found]         ` <CALbm-EbZSuzynXoUNEifP=Ga_mj6Fp9L9Do-mxhRdMvUEfogig@mail.gmail.com>
2014-03-20  1:31           ` Duy Nguyen
2014-03-18  6:19       ` David Kastrup [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lhw8vxj4.fsf@fencepost.gnu.org \
    --to=dak@gnu$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=pclouds@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox