public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Aaron Plattner <aplattner@nvidia•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object()
Date: Fri, 5 Dec 2025 16:28:39 -0500	[thread overview]
Message-ID: <20251205212839.GA35153@coredump.intra.peff.net> (raw)
In-Reply-To: <235d80bd-2516-47f9-958f-0e5a16892758@nvidia.com>

On Fri, Dec 05, 2025 at 10:50:02AM -0800, Aaron Plattner wrote:

> Unfortunately, setting that flag doesn't seem to improve performance for me
> because in parse_object_with_flags(), lookup_object() returns an obj pointer
> with obj->parsed == 0 and obj->type == OBJ_NONE. So it skips this block and
> ends up inflating the object anyway:
> 
> 	if ((!obj || obj->type == OBJ_BLOB) &&
> 	    odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
> 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
> 			error(_("hash mismatch %s"), oid_to_hex(oid));
> 			return NULL;
> 		}
> 		parse_blob_buffer(lookup_blob(r, oid));
> 		return lookup_object(r, oid);
> 	}
> 
> I was confused about why the check was structured that way, but reading the
> description of commit 8db2dad7a045e376b9c4f51ddd33da43c962e3a4 cleared that
> up. Thank you for thoroughly documenting that!
> 
> Are OBJ_NONE objects expected here? Should the check be
> 
> 	if ((!obj || obj->type == OBJ_NONE || obj->type == OBJ_BLOB) &&
> 	    odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
> 	    [...]
> 	}
> 
> ?

Yeah, that feels like a bug to me. The idea of that conditional is
"could it be a blob?" and obviously OBJ_NONE does not rule that out.

I do wonder how you end up with OBJ_NONE, though. That implies somebody
created the "struct object" but without knowing which type it was
supposed to be, and then did not follow up by actually parsing it.

That's probably immaterial to what parse_object() should be doing, but
it is certainly a curiosity. And I'm also not sure why I got good
results from my rev-list invocation, but you did not. Weird.

I think we could probably proceed without satisfying our curiosity here,
but if you felt like it, it would be interesting to find such an object
that is fed with OBJ_NONE to parse_object(), then run the command in a
debugger trying to break on the original create_object() call that
matches that oid. (Or if you want to be fancy use a reverse debugger
like rr). I might play around with it and see if I can stimulate it.

> If I make that change combined with your PARSE_OBJECT_SKIP_HASH_CHECK change
> then the time drops to 1:58, so that's great!

Cool, though I think that's about the same that you got with your patch?
I was hoping for a little bit more from skipping the hash checks and
commits, but maybe:

  1. Your commit/tree structure is dominated much more by the blobs than
     the linux.git I used for testing. So there's not much extra gain to
     be had.

  2. You didn't have a commit-graph built.

-Peff

  reply	other threads:[~2025-12-05 21:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04 17:21 [PATCH] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-05 12:36 ` Patrick Steinhardt
2025-12-05 16:55   ` Aaron Plattner
2025-12-05 17:59     ` Jeff King
2025-12-05 17:48 ` Jeff King
2025-12-05 18:01   ` Jeff King
2025-12-05 18:50     ` Aaron Plattner
2025-12-05 21:28       ` Jeff King [this message]
2025-12-05 21:56         ` Aaron Plattner
2025-12-06  1:58           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251205212839.GA35153@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=aplattner@nvidia$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox