public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
* [PATCH v2] packfile: skip decompressing and hashing blobs in add_promisor_object()
@ 2025-12-06  0:20 Aaron Plattner
  2025-12-06  2:06 ` Jeff King
  0 siblings, 1 reply; 4+ messages in thread
From: Aaron Plattner @ 2025-12-06  0:20 UTC (permalink / raw)
  To: git; +Cc: Aaron Plattner, Jeff King

When is_promisor_object() is called for the first time, it lazily
initializes a set of all promisor objects by iterating through all
objects in promisor packs. For each object, add_promisor_object() calls
parse_object(), which decompresses and hashes the entire object.

For repositories with large pack files, this can take an extremely long
time. For example, on a production repository with a 176 GB promisor
pack:

 $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
 ________________________________________________________
 Executed in   76.10 mins    fish           external
    usr time   72.10 mins    1.83 millis   72.10 mins
    sys time    3.56 mins    0.17 millis    3.56 mins

add_promisor_object() needs the full object for trees, commits, and
tags. But blobs contain no references to other objects, so the function
can just insert their oids into the set and move on.

parse_object_with_flags() has code to skip decompressing blobs, but it
unfortunately doesn't work with the objects created by
mark_uninteresting() because they have obj->type == OBJ_NONE. Update
parse_object_with_flags() to handle blobs and trees that are in this
state, and then update add_promisor_object() to use
PARSE_OBJECT_SKIP_HASH_CHECK.

This improves performance for very large pack files significantly:

 $ time ~/git/git/git-rev-list --objects --all --exclude-promisor-objects --quiet
 ________________________________________________________
 Executed in  117.63 secs    fish           external
    usr time   45.56 secs    1.09 millis   45.56 secs
    sys time   37.91 secs    1.05 millis   37.91 secs

Signed-off-by: Aaron Plattner <aplattner@nvidia•com>
---
v2: Fix PARSE_OBJECT_SKIP_HASH_CHECK with UNINTERESTING objects, use it
in parse_object_with_flags.

 object.c   | 4 ++--
 packfile.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index b08fc7a163..4669b8d65e 100644
--- a/object.c
+++ b/object.c
@@ -328,7 +328,7 @@ struct object *parse_object_with_flags(struct repository *r,
 			return &commit->object;
 	}
 
-	if ((!obj || obj->type == OBJ_BLOB) &&
+	if ((!obj || obj->type == OBJ_NONE || obj->type == OBJ_BLOB) &&
 	    odb_read_object_info(r->objects, oid, NULL) == OBJ_BLOB) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
@@ -344,7 +344,7 @@ struct object *parse_object_with_flags(struct repository *r,
 	 * have the on-disk object with the correct type.
 	 */
 	if (skip_hash && discard_tree &&
-	    (!obj || obj->type == OBJ_TREE) &&
+	    (!obj || obj->type == OBJ_NONE || obj->type == OBJ_TREE) &&
 	    odb_read_object_info(r->objects, oid, NULL) == OBJ_TREE) {
 		return &lookup_tree(r, oid)->object;
 	}
diff --git a/packfile.c b/packfile.c
index 9cc11b6dc5..01b992a4e1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2310,7 +2310,8 @@ static int add_promisor_object(const struct object_id *oid,
 		we_parsed_object = 0;
 	} else {
 		we_parsed_object = 1;
-		obj = parse_object(pack->repo, oid);
+		obj = parse_object_with_flags(pack->repo, oid,
+					      PARSE_OBJECT_SKIP_HASH_CHECK);
 	}
 
 	if (!obj)
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-12-08 20:28 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-06  0:20 [PATCH v2] packfile: skip decompressing and hashing blobs in add_promisor_object() Aaron Plattner
2025-12-06  2:06 ` Jeff King
2025-12-06 19:40   ` Aaron Plattner
2025-12-08 20:28     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox