public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr•com>
To: Jeff King <peff@peff•net>
Cc: git@vger•kernel.org, Junio C Hamano <gitster@pobox•com>,
	Elijah Newren <newren@gmail•com>,
	Derrick Stolee <stolee@gmail•com>
Subject: Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps
Date: Wed, 27 May 2026 15:24:40 -0400	[thread overview]
Message-ID: <ahdE+Je5YK9JoE7B@nand.local> (raw)
In-Reply-To: <20260527102534.GH981444@coredump.intra.peff.net>

On Wed, May 27, 2026 at 06:25:34AM -0400, Jeff King wrote:
> > It struggles, however, to efficiently generate pseudo-merge bitmaps.
> > Unlike ordinary commits for which the above algorithm is designed,
> > pseudo-merges don't represent any "real" commit in history, just a
> > grouping of non-bitmapped reference tips. In that sense, their first
> > parent is just a part of a larger set, and treating them like ordinary
> > selected commits imposes a significant slow-down when generating bitmaps
> > with pseudo-merges enabled.
>
> This is a great explanation of the problem, and especially this:
>
> > In other words, we pay a nearly ~5 minute penalty to generate
> > pseudo-merge bitmaps, but only save ~50 seconds during traversal.
>
> makes it clear that we're doing something sub-optimal. And it points us
> in the right direction, since that traversal should be able to generate
> the pseudo-merge bitmap we need in the first place! So that should be
> our goal to work towards.
>
> > Instead, build the regular selected commit bitmaps first, considering
> > only non-pseudo-merge commits in `bitmap_builder_init()`. Once those
> > bitmaps have been stored, build each pseudo-merge bitmap separately and
> > attach its parent and object bitmaps to the corresponding pseudo-merge
> > entry before writing the extension.
>
> And then this solution follows naturally from the earlier explanations.
> Good.

Thanks. For as clear as this sounds now, finding this approach took me
longer than I'd like to admit. I'm satisfied, however, with the result.

> In some ways this goes back to the pre-v2.31 way of generating bitmaps,
> which is to just traverse for each bitmap independently. But as you
> note, the whole idea of pseudo-merge bitmaps is that they aren't
> overlapping in any meaningful way. So doing one fill-in traversal per
> pseudo-merge makes sense, and hopefully we hit enough real bitmaps that
> it's not too costly.

Exactly!

> > As a result, the overhead cost for generating pseudo-merges in the above
> > configuration is much smaller:
> >
> >     +------------------+-----------------+---------------+-------------------+
> >     |                  | no pseudo-merge | pseudo-merges | Delta             |
> >     |                  |                 | (HEAD)        |                   |
> >     +------------------+-----------------+---------------+-------------------+
> >     | elapsed          |   294.1 s       |   328.4 s     |  +34.3 s (+11.7%) |
> >     | cycles           | 1,365.5 B       | 1,529.3 B     | +163.7 B (+12.0%) |
> >     | instructions     | 1,389.8 B       | 1,552.8 B     | +163.0 B (+11.7%) |
> >     | CPI              |     0.983       |     0.985     |  +0.002   (+0.2%) |
> >     +------------------+-----------------+---------------+-------------------+
>
> Nice. The time savings are going to depend on how many pseudo-merges we
> generate, I think. And I'd guess that the numbers above come from making
> one big pseudo-merge bitmap, per the config you showed earlier. But you
> probably only want a handful of them in any repo, so hopefully it
> doesn't scale _too_ badly.

That's right, though see below for more thoughts on scaling...

> > Recall that at the start of this series, generating reachability bitmaps
> > took 612.5 seconds *without* pseudo-merges. With this commit, it is
> > still ~46.38% *faster* to generate reachability bitmaps *with*
> > pseudo-merges than it was to generate bitmaps wihtout them at the
> > beginning of this series.
>
> Sure, though 612.5 seconds is all in the distant past. We only care
> about 294.1 seconds now. ;)

Heh ;-). Naturally, I agree here, but wanted to include it for context.
I wanted to point out that the accumulated changes in this series make
it cheaper to generate bitmaps with pseudo-merges now than it was to
generate bitmaps without them before.

> More seriously, I do think the interesting question here is how the time
> scales for various pseudo-merge configurations. I don't know if we have
> any real operational experience with them yet. The original idea is that
> you might slice up the ref space into a few chunks. I'd guess that the
> old code performed badly-ish overall, but the time did not grow all that
> much as you increased the number of chunks. But with the new code, I
> suspect that the cost grows more linearly with number of chunks. That's
> just a guess, though.

I'm not aware of any large-scale deployments of pseudo-merge bitmaps.
This series is written (in part) of the hopes of making one ;-). I think
your intuition on the old code matches my own.

Below are some numbers that give you a sense of how the runtime scales
with the number of pseudo-merges. I'm relying exclusively on "stable"
pseudo-merges here since they have more predictable bucketing behavior,
though note that there isn't an exact way to dial in the number of these
so-called "stable" pseudo-merge groups. We can only control their *size*
(in terms of number of parents), so I ran the harness which produced the
above code with powers of 10 between [10^3, 10^6].

Results are as follows:

    +------------+-------+----------+
    | stableSize | count | time (s) |
    +------------+-------+----------+
    |    1000000 |     1 |   34.963 |
    |     100000 |     3 |   36.954 |
    |      10000 |    26 |  221.963 |
    |       1000 |   252 | 2779.373 |
    +------------+-------+----------+

Which scales roughly like O(x^1.165) (the best fit function I could find
was t(n) = 25.18 + 4.386 * n^1.165, where 'n' is the number of
pseudo-merges, and t(n) is the time it took to generate them).

So it does grow faster than linearly, but it's not too bad. The jump
from 26 to 252 pseudo-merges is pretty significant, though, but having
that many pseudo-merges is probably not something that we would want to
do in practice.

> The other thing we hope for with pseudo-merges is that the chunks are
> selected such that most of the chunks don't change (because they are
> composed of old, stable refs). So in subsequent bitmap generations, we
> can either reuse them either verbatim or as a starting point (if there
> were only additions). But all of that is going to be heuristic and
> depend on your config, the changes the repo sees over time, and so on.
>
> So I don't know if we'd really have good numbers on that.

We don't, and it is somewhat of a pain to simulate. I think the proof
will be in the pudding, so to speak.

> > Now that we have decoupled how we generate pseudo-merges from their
> > representation, the following commits will improve the API around
> > specifying pseudo-merge groupings during bitmap generation.
>
> I think we're at patch 8/8 here. I guess you have more to come
> eventually, but for now this part is just misleading. ;)

Yeah, I cleaved this off of a larger series to make the pseudo-merge API
a little easier to reason about and less clunky to use. But I ended up
hoarding some of those patches, and apparently forgot to adjust the
message here. Thanks for spotting.

Thanks,
Taylor

  reply	other threads:[~2026-05-27 19:24 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19 16:12 [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Taylor Blau
2026-05-19 16:12 ` [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27  8:57   ` Jeff King
2026-05-27 14:36     ` Taylor Blau
2026-05-19 16:12 ` [PATCH 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27  9:03   ` Jeff King
2026-05-27 14:36     ` Taylor Blau
2026-05-19 16:12 ` [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27  9:24   ` Jeff King
2026-05-27 14:40     ` Taylor Blau
2026-05-29  6:00       ` Jeff King
2026-05-19 16:12 ` [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-20 14:42   ` SZEDER Gábor
2026-05-20 17:12     ` Taylor Blau
2026-05-27  9:27   ` Jeff King
2026-05-19 16:12 ` [PATCH 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27  9:45   ` Jeff King
2026-05-27 14:46     ` Taylor Blau
2026-05-19 16:12 ` [PATCH 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 10:04   ` Jeff King
2026-05-27 16:56     ` Taylor Blau
2026-05-29  8:26       ` Jeff King
2026-05-19 16:12 ` [PATCH 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-19 16:12 ` [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-27 10:25   ` Jeff King
2026-05-27 19:24     ` Taylor Blau [this message]
2026-05-29  8:33       ` Jeff King
2026-05-27 10:27 ` [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
2026-05-27 19:55 ` [PATCH v2 " Taylor Blau
2026-05-27 19:55   ` [PATCH v2 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 19:55   ` [PATCH v2 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 19:55   ` [PATCH v2 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 19:55   ` [PATCH v2 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-27 19:56   ` [PATCH v2 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 19:56   ` [PATCH v2 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 19:56   ` [PATCH v2 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-27 19:56   ` [PATCH v2 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-29  8:34   ` [PATCH v2 0/8] pack-bitmap-write: speed up bitmap generation Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahdE+Je5YK9JoE7B@nand.local \
    --to=me@ttaylorr$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=newren@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    --cc=stolee@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox