From: Jeff King <peff@peff•net>
To: Aaron Plattner <aplattner@nvidia•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object()
Date: Fri, 5 Dec 2025 20:58:30 -0500 [thread overview]
Message-ID: <20251206015830.GA1714099@coredump.intra.peff.net> (raw)
In-Reply-To: <4bd18399-26b3-44cd-93a7-8d2d32bef709@nvidia.com>
On Fri, Dec 05, 2025 at 01:56:23PM -0800, Aaron Plattner wrote:
> > I do wonder how you end up with OBJ_NONE, though. That implies somebody
> > created the "struct object" but without knowing which type it was
> > supposed to be, and then did not follow up by actually parsing it.
>
> If I'm understanding correctly, this loop creates a dummy struct object for
> every object in the promisor packs:
>
> if (revs->exclude_promisor_objects) {
> for_each_packed_object(revs->repo, mark_uninteresting, revs,
> FOR_EACH_OBJECT_PROMISOR_ONLY);
> }
>
> Backtrace for one such object:
>
> #0 create_object
> #1 lookup_unknown_object
> #2 mark_uninteresting
> #3 for_each_object_in_pack
> #4 for_each_packed_object
> #5 prepare_revision_walk
> #6 cmd_rev_list
> #7 run_builtin
> #8 handle_builtin
> #9 cmd_main
> #10 main
>
> Then the is_promisor_object() loop finds these dummy objects when it loops
> over all the objects again.
Ah, of course. That makes sense (and I don't think there's any other way
to do it, as we need the object struct to store the flags).
And that also explains this bit:
> > That's probably immaterial to what parse_object() should be doing, but
> > it is certainly a curiosity. And I'm also not sure why I got good
> > results from my rev-list invocation, but you did not. Weird.
>
> Yeah, that's still a mystery.
It's because in the command I used:
git rev-list --objects --exclude-promisor-objects $(perl -e 'print "1" x 40')
we call into is_promisor_object() _before_ we hit that part of
prepare_revision_walk() that marks everything uninteresting. In my
invocation above, we'd notice the missing object in get_reference() as
we try to load the initial tips for the walk, and then check it against
is_promisor_object() immediately.
And when I tried something more like your command:
git rev-list --objects --all --exclude-promisor-objects
it did mark them all uninteresting, but because I had no objects that
were missing (and not simply marked uninteresting), it never needed to
call into is_promisor_object().
So good, mystery resolved.
> > 2. You didn't have a commit-graph built.
>
> This repository came from "scalar clone" and then I created a worktree and
> disabled sparse checkout. I didn't do anything special to enable or disable
> commit-graph.
>
> What I do notice is that usually, a `git pull` from the server this
> repository is hosted on is fast, but occasionally it hits this pathological
> case. I was using git-rev-list as a proxy for what git-pull was getting
> stuck on. Is it possible that having a working commit-graph is what avoids
> the problem in the first place? I'll admit to not having a great
> understanding of how the commit graph is used during a normal pull.
I'd expect scalar to create commit-graphs. We can leave it be, but if
you're curious you can double-check that .git/objects/info has either a
commit-graph file or a commit-graphs/ directory. If not, then running
"git commit-graph write -reachable" should generate one, and you can see
if that changes the timings at all.
-Peff
prev parent reply other threads:[~2025-12-06 1:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-04 17:21 [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-05 12:36 ` Patrick Steinhardt
2025-12-05 16:55 ` Aaron Plattner
2025-12-05 17:59 ` Jeff King
2025-12-05 17:48 ` Jeff King
2025-12-05 18:01 ` Jeff King
2025-12-05 18:50 ` Aaron Plattner
2025-12-05 21:28 ` Jeff King
2025-12-05 21:56 ` Aaron Plattner
2025-12-06 1:58 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251206015830.GA1714099@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=aplattner@nvidia$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox