public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Aaron Plattner <aplattner@nvidia•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object()
Date: Fri, 5 Dec 2025 20:58:30 -0500	[thread overview]
Message-ID: <20251206015830.GA1714099@coredump.intra.peff.net> (raw)
In-Reply-To: <4bd18399-26b3-44cd-93a7-8d2d32bef709@nvidia.com>

On Fri, Dec 05, 2025 at 01:56:23PM -0800, Aaron Plattner wrote:

> > I do wonder how you end up with OBJ_NONE, though. That implies somebody
> > created the "struct object" but without knowing which type it was
> > supposed to be, and then did not follow up by actually parsing it.
> 
> If I'm understanding correctly, this loop creates a dummy struct object for
> every object in the promisor packs:
> 
> 	if (revs->exclude_promisor_objects) {
> 		for_each_packed_object(revs->repo, mark_uninteresting, revs,
> 				       FOR_EACH_OBJECT_PROMISOR_ONLY);
> 	}
> 
> Backtrace for one such object:
> 
> #0   create_object
> #1   lookup_unknown_object
> #2   mark_uninteresting
> #3   for_each_object_in_pack
> #4   for_each_packed_object
> #5   prepare_revision_walk
> #6   cmd_rev_list
> #7   run_builtin
> #8   handle_builtin
> #9   cmd_main
> #10  main
> 
> Then the is_promisor_object() loop finds these dummy objects when it loops
> over all the objects again.

Ah, of course. That makes sense (and I don't think there's any other way
to do it, as we need the object struct to store the flags).

And that also explains this bit:

> > That's probably immaterial to what parse_object() should be doing, but
> > it is certainly a curiosity. And I'm also not sure why I got good
> > results from my rev-list invocation, but you did not. Weird.
> 
> Yeah, that's still a mystery.

It's because in the command I used:

  git rev-list --objects --exclude-promisor-objects $(perl -e 'print "1" x 40')

we call into is_promisor_object() _before_ we hit that part of
prepare_revision_walk() that marks everything uninteresting. In my
invocation above, we'd notice the missing object in get_reference() as
we try to load the initial tips for the walk, and then check it against
is_promisor_object() immediately.

And when I tried something more like your command:

  git rev-list --objects --all --exclude-promisor-objects

it did mark them all uninteresting, but because I had no objects that
were missing (and not simply marked uninteresting), it never needed to
call into is_promisor_object().

So good, mystery resolved.

> >    2. You didn't have a commit-graph built.
> 
> This repository came from "scalar clone" and then I created a worktree and
> disabled sparse checkout. I didn't do anything special to enable or disable
> commit-graph.
> 
> What I do notice is that usually, a `git pull` from the server this
> repository is hosted on is fast, but occasionally it hits this pathological
> case. I was using git-rev-list as a proxy for what git-pull was getting
> stuck on. Is it possible that having a working commit-graph is what avoids
> the problem in the first place? I'll admit to not having a great
> understanding of how the commit graph is used during a normal pull.

I'd expect scalar to create commit-graphs. We can leave it be, but if
you're curious you can double-check that .git/objects/info has either a
commit-graph file or a commit-graphs/ directory. If not, then running
"git commit-graph write -reachable" should generate one, and you can see
if that changes the timings at all.

-Peff

      reply	other threads:[~2025-12-06  1:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04 17:21 [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-05 12:36 ` Patrick Steinhardt
2025-12-05 16:55   ` Aaron Plattner
2025-12-05 17:59     ` Jeff King
2025-12-05 17:48 ` Jeff King
2025-12-05 18:01   ` Jeff King
2025-12-05 18:50     ` Aaron Plattner
2025-12-05 21:28       ` Jeff King
2025-12-05 21:56         ` Aaron Plattner
2025-12-06  1:58           ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251206015830.GA1714099@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=aplattner@nvidia$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox