From: Taylor Blau <me@ttaylorr•com>
To: Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org, Elijah Newren <newren@gmail•com>,
Jeff King <peff@peff•net>, Patrick Steinhardt <ps@pks•im>
Subject: Re: [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible
Date: Thu, 10 Oct 2024 13:07:03 -0400 [thread overview]
Message-ID: <ZwgJt19kWVRTQhld@nand.local> (raw)
In-Reply-To: <xmqqzfnblxdj.fsf@gitster.g>
On Thu, Oct 10, 2024 at 09:46:00AM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr•com> writes:
>
> > This patch series enables more objects to be candidates for verbatim
> > reuse when generating a pack with the aide of reachability bitmaps.
> >
> > By the end of this series, two new classes of objects are now reuse
> > candidates which were not before. They are:
> >
> > - Cross-pack deltas. In multi-pack bitmaps, if the delta and base
> > were selected from different packs, the delta was not reusable.
>
> Hmph. Suppose that you need to send object X, you happen to have X
> as a ofs-delta against Y, but Y may appear in multiple packs.
>
> Even if the copy of Y you are going to send together with X is from
> the same packfile, you may not be sending all the objects between X
> and Y in the original local packfile, so you would need to recompute
> the offset to give ofs-delta X to the distance between X and Y in
> the resulting packstream, no?
That's right. When doing verbatim pack reuse, we mark reused "chunks"
(c.f. 'struct reused_chunk'), where each chunk records the offset of the
beginning of the chunk in the source pack, and the difference between
that value and the corresponding byte in the assembled pack.
So suppose like in your example we have X and Y from the same pack, and
we are sending both objects, but not necessarily every object in between
the two from the source pack. In that case, X and Y will end up in
different chunks.
When we go to actually rewrite the OFS_DELTA for X, we'll compute:
off_t fixup = find_reused_offset(X) - find_reused_offset(Y);
(in the code, X and Y are actually the offsets for each object in the
source pack, but I'm using the object names here for clarity).
When there is a non-zero fixup value, we'll patch the OFS_DELTA to
account for the gap between it and its base object. The details of how
that is done are in builtin/pack-objects.c::write_reused_pack_one(), in
the 'if (fixup) { ... }' block.
> So when you pick the copy of Y out of another pack, what's so
> different? After emitting Y to the resulting pack stream (and
> remembering where in the packstream you did so), when it is X's turn
> to be emitted, shouldn't you be able to compute the distance in the
> resulting packstream to represent X as an ofs-delta against Y, which
> should already be happening when you had both X and Y in the same
> original pack?
Good question. The difference is that if you're reusing X and Y from
same pack, you know that Y occurs some number of bytes *before* X in the
resulting pack.
But if Y comes from a different pack, it may get pushed further back in
the MIDX pseudo-pack order. So in that case the assembled pack may list
X before Y, in which case X cannot be an OFS_DELTA of Y, since offset
deltas require that the base object appears first.
> > - Thin deltas. In both single- and multi-pack bitmaps, we did not
> > consider reusing deltas whose base object appears in the 'haves'
> > bitmap.
>
> I hope this optimization does not kick in unless the receiving end
> is prepared to do "index-pack --fix-thin".
It does. We only allow computing "thin deltas" when pack-objects was
invoked with `--thin`.
> I've never thought about this specifically, but it is interesting to
> realize that by definition "thin" deltas cannot be ofs-deltas.
Yep.
> > Of course, REF_DELTAs have a less compact representation than
> > OFS_DELTAs, so the resulting packs will trade off some CPU time for a
> > slightly larger pack.
>
> Is comparing ref- vs ofs- delta a meaningful thing to do in the
> context of this series?
>
> What does the current code without these patches do in the same
> situation? Give up on reusing the existing delta and then? If we
> send the base representation instead, the comparison is "we
> currently do not use delta, but with this change we can reuse delta
> (even though we do not bother recompute the offset and instead use
> ref-delta)".
The current code does not reuse deltas when we're either (a) not sending
the base, (b) we are sending the base, but it's in a different pack, or
(c) both. When any of those conditions are met, we do not reuse the
existing on-disk representation of the delta object verbatim, and
instead kick it back to the normal pack generation routine.
That might result in us finding a new base, or applying sending the
object's contents as a standalone object.
> Do we recompute the delta on the fly and show it as an ofs-delta
> with the current code? Then the comparison would be "we spend time
> doing diff-delta once right now but instead reuse an existing one
> (even though we do not bother recompute the offset and instead use
> ref-delta)".
Fair.
Thanks,
Taylor
next prev parent reply other threads:[~2024-10-10 17:07 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-09 20:30 [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Taylor Blau
2024-10-09 20:31 ` [PATCH 01/11] pack-bitmap.c: do not pass `pack_pos` to `try_partial_reuse()` Taylor Blau
2024-10-09 20:31 ` [PATCH 02/11] pack-bitmap.c: avoid unnecessary `offset_to_pack_pos()` Taylor Blau
2024-10-09 20:31 ` [PATCH 03/11] pack-bitmap.c: delay calling 'offset_to_pack_pos()' Taylor Blau
2024-10-09 20:31 ` [PATCH 04/11] pack-bitmap.c: compare `base_offset` to `delta_obj_offset` Taylor Blau
2024-10-09 20:31 ` [PATCH 05/11] pack-bitmap.c: extract `find_base_bitmap_pos()` Taylor Blau
2024-10-09 20:31 ` [PATCH 06/11] pack-bitmap: drop `from_midx` field from `bitmapped_pack` Taylor Blau
2024-10-09 20:31 ` [PATCH 07/11] write_reused_pack_one(): translate bit positions directly Taylor Blau
2024-10-11 8:16 ` Jeff King
2024-11-04 20:36 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 08/11] t5332: enable OFS_DELTAs via test_pack_objects_reused Taylor Blau
2024-10-11 8:19 ` Jeff King
2024-11-04 20:50 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 09/11] pack-bitmap: enable cross-pack delta reuse Taylor Blau
2024-10-11 8:31 ` Jeff King
2024-11-04 21:00 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 10/11] pack-bitmap.c: record whether the result was filtered Taylor Blau
2024-10-11 8:35 ` Jeff King
2024-11-04 21:01 ` Taylor Blau
2024-10-09 20:31 ` [PATCH 11/11] pack-bitmap: enable reusing deltas with base objects in 'haves' bitmap Taylor Blau
2024-10-10 16:46 ` [PATCH 00/11] pack-bitmap: convert offset to ref deltas where possible Junio C Hamano
2024-10-10 17:07 ` Taylor Blau [this message]
2024-10-10 20:20 ` Junio C Hamano
2024-10-10 20:32 ` Taylor Blau
2024-10-11 7:54 ` Jeff King
2024-10-11 8:01 ` Jeff King
2024-10-11 16:23 ` Junio C Hamano
2024-10-11 8:38 ` Jeff King
2024-11-19 23:08 ` Taylor Blau
2024-11-19 23:34 ` Taylor Blau
2024-12-18 12:57 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwgJt19kWVRTQhld@nand.local \
--to=me@ttaylorr$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=newren@gmail$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=ps@pks$(echo .)im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox