From: Taylor Blau <me@ttaylorr•com>
To: Jeff King <peff@peff•net>
Cc: git@vger•kernel.org, Junio C Hamano <gitster@pobox•com>,
Elijah Newren <newren@gmail•com>,
Derrick Stolee <stolee@gmail•com>
Subject: Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps
Date: Wed, 27 May 2026 15:24:40 -0400 [thread overview]
Message-ID: <ahdE+Je5YK9JoE7B@nand.local> (raw)
In-Reply-To: <20260527102534.GH981444@coredump.intra.peff.net>
On Wed, May 27, 2026 at 06:25:34AM -0400, Jeff King wrote:
> > It struggles, however, to efficiently generate pseudo-merge bitmaps.
> > Unlike ordinary commits for which the above algorithm is designed,
> > pseudo-merges don't represent any "real" commit in history, just a
> > grouping of non-bitmapped reference tips. In that sense, their first
> > parent is just a part of a larger set, and treating them like ordinary
> > selected commits imposes a significant slow-down when generating bitmaps
> > with pseudo-merges enabled.
>
> This is a great explanation of the problem, and especially this:
>
> > In other words, we pay a nearly ~5 minute penalty to generate
> > pseudo-merge bitmaps, but only save ~50 seconds during traversal.
>
> makes it clear that we're doing something sub-optimal. And it points us
> in the right direction, since that traversal should be able to generate
> the pseudo-merge bitmap we need in the first place! So that should be
> our goal to work towards.
>
> > Instead, build the regular selected commit bitmaps first, considering
> > only non-pseudo-merge commits in `bitmap_builder_init()`. Once those
> > bitmaps have been stored, build each pseudo-merge bitmap separately and
> > attach its parent and object bitmaps to the corresponding pseudo-merge
> > entry before writing the extension.
>
> And then this solution follows naturally from the earlier explanations.
> Good.
Thanks. For as clear as this sounds now, finding this approach took me
longer than I'd like to admit. I'm satisfied, however, with the result.
> In some ways this goes back to the pre-v2.31 way of generating bitmaps,
> which is to just traverse for each bitmap independently. But as you
> note, the whole idea of pseudo-merge bitmaps is that they aren't
> overlapping in any meaningful way. So doing one fill-in traversal per
> pseudo-merge makes sense, and hopefully we hit enough real bitmaps that
> it's not too costly.
Exactly!
> > As a result, the overhead cost for generating pseudo-merges in the above
> > configuration is much smaller:
> >
> > +------------------+-----------------+---------------+-------------------+
> > | | no pseudo-merge | pseudo-merges | Delta |
> > | | | (HEAD) | |
> > +------------------+-----------------+---------------+-------------------+
> > | elapsed | 294.1 s | 328.4 s | +34.3 s (+11.7%) |
> > | cycles | 1,365.5 B | 1,529.3 B | +163.7 B (+12.0%) |
> > | instructions | 1,389.8 B | 1,552.8 B | +163.0 B (+11.7%) |
> > | CPI | 0.983 | 0.985 | +0.002 (+0.2%) |
> > +------------------+-----------------+---------------+-------------------+
>
> Nice. The time savings are going to depend on how many pseudo-merges we
> generate, I think. And I'd guess that the numbers above come from making
> one big pseudo-merge bitmap, per the config you showed earlier. But you
> probably only want a handful of them in any repo, so hopefully it
> doesn't scale _too_ badly.
That's right, though see below for more thoughts on scaling...
> > Recall that at the start of this series, generating reachability bitmaps
> > took 612.5 seconds *without* pseudo-merges. With this commit, it is
> > still ~46.38% *faster* to generate reachability bitmaps *with*
> > pseudo-merges than it was to generate bitmaps wihtout them at the
> > beginning of this series.
>
> Sure, though 612.5 seconds is all in the distant past. We only care
> about 294.1 seconds now. ;)
Heh ;-). Naturally, I agree here, but wanted to include it for context.
I wanted to point out that the accumulated changes in this series make
it cheaper to generate bitmaps with pseudo-merges now than it was to
generate bitmaps without them before.
> More seriously, I do think the interesting question here is how the time
> scales for various pseudo-merge configurations. I don't know if we have
> any real operational experience with them yet. The original idea is that
> you might slice up the ref space into a few chunks. I'd guess that the
> old code performed badly-ish overall, but the time did not grow all that
> much as you increased the number of chunks. But with the new code, I
> suspect that the cost grows more linearly with number of chunks. That's
> just a guess, though.
I'm not aware of any large-scale deployments of pseudo-merge bitmaps.
This series is written (in part) of the hopes of making one ;-). I think
your intuition on the old code matches my own.
Below are some numbers that give you a sense of how the runtime scales
with the number of pseudo-merges. I'm relying exclusively on "stable"
pseudo-merges here since they have more predictable bucketing behavior,
though note that there isn't an exact way to dial in the number of these
so-called "stable" pseudo-merge groups. We can only control their *size*
(in terms of number of parents), so I ran the harness which produced the
above code with powers of 10 between [10^3, 10^6].
Results are as follows:
+------------+-------+----------+
| stableSize | count | time (s) |
+------------+-------+----------+
| 1000000 | 1 | 34.963 |
| 100000 | 3 | 36.954 |
| 10000 | 26 | 221.963 |
| 1000 | 252 | 2779.373 |
+------------+-------+----------+
Which scales roughly like O(x^1.165) (the best fit function I could find
was t(n) = 25.18 + 4.386 * n^1.165, where 'n' is the number of
pseudo-merges, and t(n) is the time it took to generate them).
So it does grow faster than linearly, but it's not too bad. The jump
from 26 to 252 pseudo-merges is pretty significant, though, but having
that many pseudo-merges is probably not something that we would want to
do in practice.
> The other thing we hope for with pseudo-merges is that the chunks are
> selected such that most of the chunks don't change (because they are
> composed of old, stable refs). So in subsequent bitmap generations, we
> can either reuse them either verbatim or as a starting point (if there
> were only additions). But all of that is going to be heuristic and
> depend on your config, the changes the repo sees over time, and so on.
>
> So I don't know if we'd really have good numbers on that.
We don't, and it is somewhat of a pain to simulate. I think the proof
will be in the pudding, so to speak.
> > Now that we have decoupled how we generate pseudo-merges from their
> > representation, the following commits will improve the API around
> > specifying pseudo-merge groupings during bitmap generation.
>
> I think we're at patch 8/8 here. I guess you have more to come
> eventually, but for now this part is just misleading. ;)
Yeah, I cleaved this off of a larger series to make the pseudo-merge API
a little easier to reason about and less clunky to use. But I ended up
hoarding some of those patches, and apparently forgot to adjust the
message here. Thanks for spotting.
Thanks,
Taylor
next prev parent reply other threads:[~2026-05-27 19:24 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 16:12 [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Taylor Blau
2026-05-19 16:12 ` [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 8:57 ` Jeff King
2026-05-27 14:36 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 9:03 ` Jeff King
2026-05-27 14:36 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 9:24 ` Jeff King
2026-05-27 14:40 ` Taylor Blau
2026-05-29 6:00 ` Jeff King
2026-05-19 16:12 ` [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-20 14:42 ` SZEDER Gábor
2026-05-20 17:12 ` Taylor Blau
2026-05-27 9:27 ` Jeff King
2026-05-19 16:12 ` [PATCH 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 9:45 ` Jeff King
2026-05-27 14:46 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 10:04 ` Jeff King
2026-05-27 16:56 ` Taylor Blau
2026-05-29 8:26 ` Jeff King
2026-05-19 16:12 ` [PATCH 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-19 16:12 ` [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-27 10:25 ` Jeff King
2026-05-27 19:24 ` Taylor Blau [this message]
2026-05-29 8:33 ` Jeff King
2026-05-27 10:27 ` [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
2026-05-27 19:55 ` [PATCH v2 " Taylor Blau
2026-05-27 19:55 ` [PATCH v2 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 19:55 ` [PATCH v2 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 19:55 ` [PATCH v2 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 19:55 ` [PATCH v2 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-27 19:56 ` [PATCH v2 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 19:56 ` [PATCH v2 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 19:56 ` [PATCH v2 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-27 19:56 ` [PATCH v2 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-29 8:34 ` [PATCH v2 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahdE+Je5YK9JoE7B@nand.local \
--to=me@ttaylorr$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=newren@gmail$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=stolee@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox