* [PATCH v3 0/2] improve --exclude-promisor-objects performance
@ 2025-12-09 1:48 Aaron Plattner
2025-12-09 1:48 ` [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too Aaron Plattner
2025-12-09 1:48 ` [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object() Aaron Plattner
0 siblings, 2 replies; 3+ messages in thread
From: Aaron Plattner @ 2025-12-09 1:48 UTC (permalink / raw)
To: git; +Cc: Aaron Plattner, Jeff King
This series fixes the PARSE_OBJECT_SKIP_HASH_CHECK optimization in
parse_object_with_flags() so that it applies to objects with their type set to
OBJ_NONE too, and then uses that behavior significantly improve the performance
of add_promisor_object().
Aaron Plattner (2):
object: apply skip_hash and discard_tree optimizations to unknown
blobs too
packfile: skip hash checks in add_promisor_object()
object.c | 4 ++--
packfile.c | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too
2025-12-09 1:48 [PATCH v3 0/2] improve --exclude-promisor-objects performance Aaron Plattner
@ 2025-12-09 1:48 ` Aaron Plattner
2025-12-09 1:48 ` [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object() Aaron Plattner
1 sibling, 0 replies; 3+ messages in thread
From: Aaron Plattner @ 2025-12-09 1:48 UTC (permalink / raw)
To: git; +Cc: Aaron Plattner, Jeff King
parse_object_with_flags() has an optimization to skip parsing blobs if
PARSE_OBJECT_SKIP_HASH_CHECK is set and the object hasn't been seen
before or might be a blob but hasn't been parsed yet. The latter can
happen, for example, if add_tree_entries() walks a path that references
a blob object that hasn't been seen before: lookup_blob() marks the
referenced oid as being a blob, but does not provide any additional
information about it until it is parsed.
It's possible for an object to be created without even a type, such as
when prepare_revision_walk() uses mark_uninteresting() to mark all
promisor objects as uninteresting. These objects have obj->parsed ==
false and obj->type == OBJ_NONE.
The skip_hash optimization does not consider this kind of object, so
parse_object_with_flags() proceeds to fully parse the object to
determine its type.
Improve the optimization by applying it to OBJ_NONE objects as well as
OBJ_BLOB ones. Apply a similar fix for trees.
Fixes: 8db2dad7a045 ("parse_object(): check on-disk type of suspected blob")
Signed-off-by: Aaron Plattner <aplattner@nvidia•com>
---
object.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/object.c b/object.c
index b08fc7a163..4669b8d65e 100644
--- a/object.c
+++ b/object.c
@@ -328,7 +328,7 @@ struct object *parse_object_with_flags(struct repository *r,
return &commit->object;
}
- if ((!obj || obj->type == OBJ_BLOB) &&
+ if ((!obj || obj->type == OBJ_NONE || obj->type == OBJ_BLOB) &&
odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
if (!skip_hash && stream_object_signature(r, repl) < 0) {
error(_("hash mismatch %s"), oid_to_hex(oid));
@@ -344,7 +344,7 @@ struct object *parse_object_with_flags(struct repository *r,
* have the on-disk object with the correct type.
*/
if (skip_hash && discard_tree &&
- (!obj || obj->type == OBJ_TREE) &&
+ (!obj || obj->type == OBJ_NONE || obj->type == OBJ_TREE) &&
odb_read_object_info(r->objects, oid, NULL) == OBJ_TREE) {
return &lookup_tree(r, oid)->object;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object()
2025-12-09 1:48 [PATCH v3 0/2] improve --exclude-promisor-objects performance Aaron Plattner
2025-12-09 1:48 ` [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too Aaron Plattner
@ 2025-12-09 1:48 ` Aaron Plattner
1 sibling, 0 replies; 3+ messages in thread
From: Aaron Plattner @ 2025-12-09 1:48 UTC (permalink / raw)
To: git; +Cc: Aaron Plattner, Jeff King
When is_promisor_object() is called for the first time, it lazily
initializes a set of all promisor objects by iterating through all
objects in promisor packs. For each object, add_promisor_object() calls
parse_object(), which decompresses and hashes the entire object.
For repositories with large pack files, this can take an extremely long
time. For example, on a production repository with a 176 GB promisor
pack:
$ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
________________________________________________________
Executed in 76.10 mins fish external
usr time 72.10 mins 1.83 millis 72.10 mins
sys time 3.56 mins 0.17 millis 3.56 mins
add_promisor_object() just wants to construct the set of all promisor
objects, so it doesn't really need to verify the hash of every object.
Set PARSE_OBJECT_SKIP_HASH_CHECK to skip the hash check. This has the
side effect of skipping decompression of blob objects completely, saving
a significant amount of time:
$ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
________________________________________________________
Executed in 124.70 secs fish external
usr time 46.94 secs 0.00 millis 46.94 secs
sys time 43.11 secs 1.03 millis 43.11 secs
Signed-off-by: Aaron Plattner <aplattner@nvidia•com>
---
packfile.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/packfile.c b/packfile.c
index 3d8b994a61..d3014b6746 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2295,7 +2295,8 @@ static int add_promisor_object(const struct object_id *oid,
we_parsed_object = 0;
} else {
we_parsed_object = 1;
- obj = parse_object(pack->repo, oid);
+ obj = parse_object_with_flags(pack->repo, oid,
+ PARSE_OBJECT_SKIP_HASH_CHECK);
}
if (!obj)
--
2.52.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-12-09 1:49 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09 1:48 [PATCH v3 0/2] improve --exclude-promisor-objects performance Aaron Plattner
2025-12-09 1:48 ` [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too Aaron Plattner
2025-12-09 1:48 ` [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object() Aaron Plattner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox