From: Jeff King <peff@peff•net>
To: Taylor Blau <me@ttaylorr•com>
Cc: git@vger•kernel.org, Junio C Hamano <gitster@pobox•com>,
Elijah Newren <newren@gmail•com>,
Derrick Stolee <stolee@gmail•com>
Subject: Re: [PATCH 6/8] pack-bitmap: sort bitmaps before XORing
Date: Wed, 27 May 2026 06:04:06 -0400 [thread overview]
Message-ID: <20260527100406.GG981444@coredump.intra.peff.net> (raw)
In-Reply-To: <b0a4f31353a7053ab37b6d8c8f22c69bcfadfe50.1779207127.git.me@ttaylorr.com>
On Tue, May 19, 2026 at 12:12:50PM -0400, Taylor Blau wrote:
> Reachability bitmaps may be stored as XORs against nearby bitmaps, up to
> 10 away. However, when callers provide selected commits in an arbitrary
> order, the writer may miss good ancestor/descendant pairs and produce
> much larger bitmap files without changing query coverage.
>
> Sort the selected bitmaps in date order (from oldest to newest) before
> computing XOR offsets, leaving pseudo-merge bitmaps alone (which we will
> deal with separately in following commits).
That order certainly makes the most sense. I'd have thought we ended up
there incidentally because of the order in which we consider the
commits, but perhaps not. I wonder if this got much worse when we
re-wrote the bitmap generation code a few years ago.
That was in v2.31.0, I think. Repacking linux.git with bitmaps, though,
I couldn't find any difference in size between v2.30 and v2.31. They're
both ~67M. But that also didn't shrink with this patch, either.
If you have some spare CPU cycles to burn, I would be interested in a
comparison of the bitmap size of your test repo using v2.30.0, v2.31.1,
and this patch.
> On our same testing repository from previous commits, this change shrunk
> our selection of 1,261 bitmaps from ~635.46 MiB to 176.4 MiB for a
> ~72.24% reduction in the on-disk size of our *.bitmap file. The time to
> generate the smaller bitmap file decreased by ~3.69 seconds, though this
> is likely mostly noise.
Certainly good numbers. The obvious follow-up question is: how does the
reading side fare? I'd expect it to be a little better, if only because
there are fewer bytes to consider when XOR-ing. But if there's some
hidden assumption we're missing, then it could get wildly worse. It
would be good to confirm that that didn't happen. ;)
> static void compute_xor_offsets(struct bitmap_writer *writer)
> {
> static const int MAX_XOR_OFFSET_SEARCH = 10;
>
> int i, next = 0;
> + int nr = bitmap_writer_nr_selected_commits(writer);
> +
> + if (nr > 1) {
> + QSORT(writer->selected, nr, bitmapped_commit_date_cmp);
> +
> + for (i = 0; i < nr; i++) {
> + struct bitmapped_commit *stored = &writer->selected[i];
> + khiter_t hash_pos = kh_get_oid_map(writer->bitmaps,
> + stored->commit->object.oid);
> +
> + if (hash_pos == kh_end(writer->bitmaps))
> + BUG("selected commit missing from bitmap map: %s",
> + oid_to_hex(&stored->commit->object.oid));
> +
> + kh_value(writer->bitmaps, hash_pos) = stored;
> + }
> + }
OK. It took me a minute to wrap my head around this. The real work is
done by QSORT(). But because we maintain a hash pointing into that
array, we have to go through each hash entry and fix up its pointer.
Looks correct.
-Peff
next prev parent reply other threads:[~2026-05-27 10:04 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 16:12 [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Taylor Blau
2026-05-19 16:12 ` [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 8:57 ` Jeff King
2026-05-27 14:36 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 9:03 ` Jeff King
2026-05-27 14:36 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 9:24 ` Jeff King
2026-05-27 14:40 ` Taylor Blau
2026-05-29 6:00 ` Jeff King
2026-05-19 16:12 ` [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-20 14:42 ` SZEDER Gábor
2026-05-20 17:12 ` Taylor Blau
2026-05-27 9:27 ` Jeff King
2026-05-19 16:12 ` [PATCH 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 9:45 ` Jeff King
2026-05-27 14:46 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 10:04 ` Jeff King [this message]
2026-05-27 16:56 ` Taylor Blau
2026-05-29 8:26 ` Jeff King
2026-05-19 16:12 ` [PATCH 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-19 16:12 ` [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-27 10:25 ` Jeff King
2026-05-27 19:24 ` Taylor Blau
2026-05-29 8:33 ` Jeff King
2026-05-27 10:27 ` [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
2026-05-27 19:55 ` [PATCH v2 " Taylor Blau
2026-05-27 19:55 ` [PATCH v2 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 19:55 ` [PATCH v2 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 19:55 ` [PATCH v2 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 19:55 ` [PATCH v2 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-27 19:56 ` [PATCH v2 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 19:56 ` [PATCH v2 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 19:56 ` [PATCH v2 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-27 19:56 ` [PATCH v2 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-29 8:34 ` [PATCH v2 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260527100406.GG981444@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=me@ttaylorr$(echo .)com \
--cc=newren@gmail$(echo .)com \
--cc=stolee@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox