public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
* [PATCH v3 0/2] improve --exclude-promisor-objects performance
@ 2025-12-09  1:48 Aaron Plattner
  2025-12-09  1:48 ` [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too Aaron Plattner
  2025-12-09  1:48 ` [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object() Aaron Plattner
  0 siblings, 2 replies; 3+ messages in thread
From: Aaron Plattner @ 2025-12-09  1:48 UTC (permalink / raw)
  To: git; +Cc: Aaron Plattner, Jeff King

This series fixes the PARSE_OBJECT_SKIP_HASH_CHECK optimization in
parse_object_with_flags() so that it applies to objects with their type set to
OBJ_NONE too, and then uses that behavior significantly improve the performance
of add_promisor_object().

Aaron Plattner (2):
  object: apply skip_hash and discard_tree optimizations to unknown
    blobs too
  packfile: skip hash checks in add_promisor_object()

 object.c   | 4 ++--
 packfile.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too
  2025-12-09  1:48 [PATCH v3 0/2] improve --exclude-promisor-objects performance Aaron Plattner
@ 2025-12-09  1:48 ` Aaron Plattner
  2025-12-09  1:48 ` [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object() Aaron Plattner
  1 sibling, 0 replies; 3+ messages in thread
From: Aaron Plattner @ 2025-12-09  1:48 UTC (permalink / raw)
  To: git; +Cc: Aaron Plattner, Jeff King

parse_object_with_flags() has an optimization to skip parsing blobs if
PARSE_OBJECT_SKIP_HASH_CHECK is set and the object hasn't been seen
before or might be a blob but hasn't been parsed yet. The latter can
happen, for example, if add_tree_entries() walks a path that references
a blob object that hasn't been seen before: lookup_blob() marks the
referenced oid as being a blob, but does not provide any additional
information about it until it is parsed.

It's possible for an object to be created without even a type, such as
when prepare_revision_walk() uses mark_uninteresting() to mark all
promisor objects as uninteresting. These objects have obj->parsed ==
false and obj->type == OBJ_NONE.

The skip_hash optimization does not consider this kind of object, so
parse_object_with_flags() proceeds to fully parse the object to
determine its type.

Improve the optimization by applying it to OBJ_NONE objects as well as
OBJ_BLOB ones. Apply a similar fix for trees.

Fixes: 8db2dad7a045 ("parse_object(): check on-disk type of suspected blob")
Signed-off-by: Aaron Plattner <aplattner@nvidia•com>
---
 object.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index b08fc7a163..4669b8d65e 100644
--- a/object.c
+++ b/object.c
@@ -328,7 +328,7 @@ struct object *parse_object_with_flags(struct repository *r,
 			return &commit->object;
 	}
 
-	if ((!obj || obj->type == OBJ_BLOB) &&
+	if ((!obj || obj->type == OBJ_NONE || obj->type == OBJ_BLOB) &&
 	    odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
@@ -344,7 +344,7 @@ struct object *parse_object_with_flags(struct repository *r,
 	 * have the on-disk object with the correct type.
 	 */
 	if (skip_hash && discard_tree &&
-	    (!obj || obj->type == OBJ_TREE) &&
+	    (!obj || obj->type == OBJ_NONE || obj->type == OBJ_TREE) &&
 	    odb_read_object_info(r->objects, oid, NULL) == OBJ_TREE) {
 		return &lookup_tree(r, oid)->object;
 	}
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object()
  2025-12-09  1:48 [PATCH v3 0/2] improve --exclude-promisor-objects performance Aaron Plattner
  2025-12-09  1:48 ` [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too Aaron Plattner
@ 2025-12-09  1:48 ` Aaron Plattner
  1 sibling, 0 replies; 3+ messages in thread
From: Aaron Plattner @ 2025-12-09  1:48 UTC (permalink / raw)
  To: git; +Cc: Aaron Plattner, Jeff King

When is_promisor_object() is called for the first time, it lazily
initializes a set of all promisor objects by iterating through all
objects in promisor packs. For each object, add_promisor_object() calls
parse_object(), which decompresses and hashes the entire object.

For repositories with large pack files, this can take an extremely long
time. For example, on a production repository with a 176 GB promisor
pack:

 $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
 ________________________________________________________
 Executed in   76.10 mins    fish           external
    usr time   72.10 mins    1.83 millis   72.10 mins
    sys time    3.56 mins    0.17 millis    3.56 mins

add_promisor_object() just wants to construct the set of all promisor
objects, so it doesn't really need to verify the hash of every object.
Set PARSE_OBJECT_SKIP_HASH_CHECK to skip the hash check. This has the
side effect of skipping decompression of blob objects completely, saving
a significant amount of time:

 $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
 ________________________________________________________
 Executed in  124.70 secs    fish           external
    usr time   46.94 secs    0.00 millis   46.94 secs
    sys time   43.11 secs    1.03 millis   43.11 secs

Signed-off-by: Aaron Plattner <aplattner@nvidia•com>
---
 packfile.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/packfile.c b/packfile.c
index 3d8b994a61..d3014b6746 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2295,7 +2295,8 @@ static int add_promisor_object(const struct object_id *oid,
 		we_parsed_object = 0;
 	} else {
 		we_parsed_object = 1;
-		obj = parse_object(pack->repo, oid);
+		obj = parse_object_with_flags(pack->repo, oid,
+					      PARSE_OBJECT_SKIP_HASH_CHECK);
 	}
 
 	if (!obj)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-12-09  1:49 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-09  1:48 [PATCH v3 0/2] improve --exclude-promisor-objects performance Aaron Plattner
2025-12-09  1:48 ` [PATCH v3 1/2] object: apply skip_hash and discard_tree optimizations to unknown blobs too Aaron Plattner
2025-12-09  1:48 ` [PATCH v3 2/2] packfile: skip hash checks in add_promisor_object() Aaron Plattner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox