From: Aaron Plattner <aplattner@nvidia•com>
To: Jeff King <peff@peff•net>
Cc: git@vger•kernel.org
Subject: Re: [PATCH v2] packfile: skip decompressing and hashing blobs in add_promisor_object()
Date: Sat, 6 Dec 2025 11:40:01 -0800 [thread overview]
Message-ID: <51c866cb-9a7a-4c59-834a-2f710f34f3a1@nvidia.com> (raw)
In-Reply-To: <20251206020648.GB1714099@coredump.intra.peff.net>
On 12/5/25 6:06 PM, Jeff King wrote:
> On Fri, Dec 05, 2025 at 04:20:12PM -0800, Aaron Plattner wrote:
>
>> When is_promisor_object() is called for the first time, it lazily
>> initializes a set of all promisor objects by iterating through all
>> objects in promisor packs. For each object, add_promisor_object() calls
>> parse_object(), which decompresses and hashes the entire object.
>>
>> For repositories with large pack files, this can take an extremely long
>> time. For example, on a production repository with a 176 GB promisor
>> pack:
>>
>> $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
>> ________________________________________________________
>> Executed in 76.10 mins fish external
>> usr time 72.10 mins 1.83 millis 72.10 mins
>> sys time 3.56 mins 0.17 millis 3.56 mins
>>
>> add_promisor_object() needs the full object for trees, commits, and
>> tags. But blobs contain no references to other objects, so the function
>> can just insert their oids into the set and move on.
>>
>> parse_object_with_flags() has code to skip decompressing blobs, but it
>> unfortunately doesn't work with the objects created by
>> mark_uninteresting() because they have obj->type == OBJ_NONE. Update
>> parse_object_with_flags() to handle blobs and trees that are in this
>> state, and then update add_promisor_object() to use
>> PARSE_OBJECT_SKIP_HASH_CHECK.
>
> Good catch on the matching tree code. It doesn't trigger for your use
> case (the caller has to pass in the DISCARD_TREE flag), but it's a
> lurking bug nonetheless.
>
> I'm tempted to say that those changes in parse_object_with_flags()
> should happen as a separate patch, since they really are fixing an
> existing bug. But I can live with it all as one, too.
>
> One other thing it might be worth thinking about or mentioning in the
> commit message: we are skipping the hash check on all objects now (not
> just blobs). I think this is OK to do along the lines of discussion in
> c868d8e91f (parse_object(): allow skipping hash check, 2022-09-06). I
> dunno. Maybe it is kind of self-evident that not every operation needs
> to do a consistency check of every object.
I was rewriting the commit message for that part to justify why it's
safe to use PARSE_OBJECT_SKIP_HASH_CHECK, and now I'm questioning it. :)
It definitely seems fine for blobs but if what we're trying to check for
is on-disk corruption, maybe it's not a good idea to skip it for other
objects since we're actually using their contents here.
I still think the OBJ_NONE fix is worthwhile and I'll send that out
separately, but maybe it would be a good idea to split
PARSE_OBJECT_SKIP_HASH_CHECK into separate flags for blobs vs. all
objects? Or just go back to v1 of the add_promisor_object() patch? Or do
you think this version is okay despite that concern?
-- Aaron
>> object.c | 4 ++--
>> packfile.c | 3 ++-
>> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> The patch itself looks great to me.
>
> -Peff
next prev parent reply other threads:[~2025-12-06 19:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 0:20 [PATCH v2] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-06 2:06 ` Jeff King
2025-12-06 19:40 ` Aaron Plattner [this message]
2025-12-08 20:28 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51c866cb-9a7a-4c59-834a-2f710f34f3a1@nvidia.com \
--to=aplattner@nvidia$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox