From: Jeff King <peff@peff•net>
To: Kristofer Karlsson <krka@spotify•com>
Cc: Kristofer Karlsson via GitGitGadget <gitgitgadget@gmail•com>,
git@vger•kernel.org
Subject: Re: [PATCH] fetch: pass transport to post-fetch connectivity check
Date: Wed, 27 May 2026 06:39:30 -0400 [thread overview]
Message-ID: <20260527103930.GJ981444@coredump.intra.peff.net> (raw)
In-Reply-To: <CAL71e4MrVqC1=AR6x0_8S=8kVqPdDkhgCZRb4etFsxTzd6s_8Q@mail.gmail.com>
On Wed, May 27, 2026 at 12:04:19PM +0200, Kristofer Karlsson wrote:
> You're right. I dug into this further and realized the problem is deeper
> than just the flag not being set in builtin/fetch.c.
>
> Even if we add:
> transport->smart_options->check_self_contained_and_connected = 1;
> to prepare_transport(), the optimization still won't work for fetches.
>
> The optimization is fundamentally clone-only.
I have wondered if the transport could do the same thing for:
git init
git fetch ...
When we do not send any "want", then we'd expect the pack we receive to
be self-contained.
But in practice that is not that exciting, as it is a special case that
does not come up that often.
I suspect there's some hybrid mode where we could save some work. It is
easy for index-pack to come up with a list of "edges" from the pack it
got that point outside of the pack. We just need to know that those
edges are reachable from existing refs. So really we could be checking
the connectivity of those edges, rather than the actual ref tips.
Would that be less work? I'm not sure. It saves walking over the
newly-fetched history, but in practice that is probably not that
expensive. It potentially saves a lot when the edge is a ref tip; for a
true fast-forward we'd see a ref going from A..B, and if index-pack
tells us that it just needs A, we can skip the traversal entirely.
But index-pack isn't really thinking in terms of commits, but rather the
whole object graph. So you're going to find that commit A is needed, but
also all of the tree entries in the existing history that weren't
touched by the new history (e.g., B touched path "foo" but not "bar", so
it gets a new top-level tree, a new blob for "foo", but still references
the existing blob for "bar"). I guess you could speculatively load A^{tree}
to cull the list.
So I dunno. I think there is some room for speedup here, and in many
common cases you could skip the rev-list invocation entirely. But it's
not trivial, and I think is far afield from what your patch was
originally trying to do. ;)
> I was unable to reproduce the benchmark numbers from my original commit
> message.
Yeah, I wondered where the numbers came from. It is very easy to fool
yourself with fetch benchmarks, because even "fetch --dry-run" will
transfer objects, and under the hood we try to optimize out as much
object transfer as possible. So you really have to start from the exact
same on-disk state for each trial.
> The patch as submitted is indeed inert for non-clone fetches.
> It looked like a simple improvement, but it's clear that it was incorrect.
> I'll drop it, and I apologize for the noise here.
No problem. You've been generating some interesting optimization work
lately, so I can't complain. :)
I'll probably have more comments on your other topics, but I'm out of
time for tonight.
-Peff
prev parent reply other threads:[~2026-05-27 10:39 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-24 12:28 [PATCH] fetch: pass transport to post-fetch connectivity check Kristofer Karlsson via GitGitGadget
2026-05-24 12:53 ` Junio C Hamano
2026-05-24 13:04 ` Kristofer Karlsson
2026-05-27 8:32 ` Jeff King
2026-05-27 10:04 ` Kristofer Karlsson
2026-05-27 10:39 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260527103930.GJ981444@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitgitgadget@gmail$(echo .)com \
--cc=krka@spotify$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox