From: Jeff King <peff@peff•net>
To: Aaron Plattner <aplattner@nvidia•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH v2] packfile: skip decompressing and hashing blobs in add_promisor_object()
Date: Mon, 8 Dec 2025 15:28:12 -0500 [thread overview]
Message-ID: <20251208202812.GC216526@coredump.intra.peff.net> (raw)
In-Reply-To: <51c866cb-9a7a-4c59-834a-2f710f34f3a1@nvidia.com>
On Sat, Dec 06, 2025 at 11:40:01AM -0800, Aaron Plattner wrote:
> I was rewriting the commit message for that part to justify why it's safe to
> use PARSE_OBJECT_SKIP_HASH_CHECK, and now I'm questioning it. :)
>
> It definitely seems fine for blobs but if what we're trying to check for is
> on-disk corruption, maybe it's not a good idea to skip it for other objects
> since we're actually using their contents here.
>
> I still think the OBJ_NONE fix is worthwhile and I'll send that out
> separately, but maybe it would be a good idea to split
> PARSE_OBJECT_SKIP_HASH_CHECK into separate flags for blobs vs. all objects?
> Or just go back to v1 of the add_promisor_object() patch? Or do you think
> this version is okay despite that concern?
I think it's OK to skip the hashes here. In some code paths we really
care about checking the consistency of the objects (like "rev-list
--verify", or fsck). But if the caller knows that we are not trying to
do that, then that opens the doors to other optimizations like avoiding
object loads entirely.
You could argue that if we _do_ load an object, we might as well hash it
to check its consistency. IMHO there is not much value in that, as the
cost is not totally trivial, it's not a thorough validation of what's on
disk (because we are skipping some objects), and this sort of
bit-flipping corruption is pretty rare in the first place.
So to my mind, there are really two types of callers that want to parse
an object: ones that are verifying database consistency and want to be
thorough, and ones that want things to be as fast as possible. And I
think the caller here (traversing the promisor objects) is in the latter
camp. E.g., if we ever added a secondary index of "these are all the
promised objects we don't have" we would just use that!
So really, I think all I was suggesting for the commit message is to say
"this is the kind of caller that wants things to be as fast as possible". ;)
The ideal series to me is probably:
- patch 1 fixes the OBJ_NONE issues. It would be great if there was
some way to show the impact of this bug on an existing case, but I
imagine it would be quite hard. We'd never produce the wrong answer
but just do things slowly. So you'd need a case where we end up with
OBJ_NONE, and then a SKIP_HASH code path that loads a big blob. It's
easy to find the latter (just hand rev-list a blob on the command
line), but the former is harder. I doubt it's worth the effort to
dig for one, since we know your case in patch 2 will demonstrate it.
- patch 2 uses SKIP_HASH
-Peff
prev parent reply other threads:[~2025-12-08 20:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 0:20 [PATCH v2] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-06 2:06 ` Jeff King
2025-12-06 19:40 ` Aaron Plattner
2025-12-08 20:28 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251208202812.GC216526@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=aplattner@nvidia$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox