public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Eric Wong <e@80x24•org>
To: git@vger•kernel.org
Cc: Jeff King <peff@peff•net>
Subject: [PATCH v1 02/10] packfile: allow content-limit for cat-file
Date: Mon, 15 Jul 2024 00:35:11 +0000	[thread overview]
Message-ID: <20240715003519.2671385-3-e@80x24.org> (raw)
In-Reply-To: <20240715003519.2671385-1-e@80x24.org>

From: Jeff King <peff@peff•net>

This avoids unnecessary round trips to the object store to speed
up cat-file contents retrievals.  The majority of packed objects
don't benefit from the streaming interface at all and we end up
having to load them in core anyways to satisfy our streaming
API.

This drops the runtime of
`git cat-file --batch-all-objects --unordered --batch' from
~7.1s to ~6.1s on Jeff's machine.

[ew: commit message]

Signed-off-by: Jeff King <peff@peff•net>
Signed-off-by: Eric Wong <e@80x24•org>
---
 builtin/cat-file.c | 17 +++++++++++++++--
 object-file.c      |  6 ++++++
 object-store-ll.h  |  1 +
 packfile.c         | 13 ++++++++++++-
 4 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 18fe58d6b8..bc4bb89610 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -280,6 +280,7 @@ struct expand_data {
 	off_t disk_size;
 	const char *rest;
 	struct object_id delta_base_oid;
+	void *content;
 
 	/*
 	 * If mark_query is true, we do not expand anything, but rather
@@ -383,7 +384,10 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 
 	assert(data->info.typep);
 
-	if (data->type == OBJ_BLOB) {
+	if (data->content) {
+		batch_write(opt, data->content, data->size);
+		FREE_AND_NULL(data->content);
+	} else if (data->type == OBJ_BLOB) {
 		if (opt->buffer_output)
 			fflush(stdout);
 		if (opt->transform_mode) {
@@ -801,9 +805,18 @@ static int batch_objects(struct batch_options *opt)
 	/*
 	 * If we are printing out the object, then always fill in the type,
 	 * since we will want to decide whether or not to stream.
+	 *
+	 * Likewise, grab the content in the initial request if it's small
+	 * and we're not planning to filter it.
 	 */
-	if (opt->batch_mode == BATCH_MODE_CONTENTS)
+	if (opt->batch_mode == BATCH_MODE_CONTENTS) {
 		data.info.typep = &data.type;
+		if (!opt->transform_mode) {
+			data.info.sizep = &data.size;
+			data.info.contentp = &data.content;
+			data.info.content_limit = big_file_threshold;
+		}
+	}
 
 	if (opt->all_objects) {
 		struct object_cb_data cb;
diff --git a/object-file.c b/object-file.c
index 065103be3e..1cc29c3c58 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1492,6 +1492,12 @@ static int loose_object_info(struct repository *r,
 
 		if (!oi->contentp)
 			break;
+		if (oi->content_limit && *oi->sizep > oi->content_limit) {
+			git_inflate_end(&stream);
+			oi->contentp = NULL;
+			goto cleanup;
+		}
+
 		*oi->contentp = unpack_loose_rest(&stream, hdr, *oi->sizep, oid);
 		if (*oi->contentp)
 			goto cleanup;
diff --git a/object-store-ll.h b/object-store-ll.h
index c5f2bb2fc2..b71a15f590 100644
--- a/object-store-ll.h
+++ b/object-store-ll.h
@@ -289,6 +289,7 @@ struct object_info {
 	struct object_id *delta_base_oid;
 	struct strbuf *type_name;
 	void **contentp;
+	size_t content_limit;
 
 	/* Response */
 	enum {
diff --git a/packfile.c b/packfile.c
index e547522e3d..54b9d46928 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1530,7 +1530,7 @@ int packed_object_info(struct repository *r, struct packed_git *p,
 	 * a "real" type later if the caller is interested. Likewise...
 	 * tbd.
 	 */
-	if (oi->contentp) {
+	if (oi->contentp && !oi->content_limit) {
 		*oi->contentp = cache_or_unpack_entry(r, p, obj_offset, oi->sizep,
 						      &type);
 		if (!*oi->contentp)
@@ -1556,6 +1556,17 @@ int packed_object_info(struct repository *r, struct packed_git *p,
 				*oi->sizep = size;
 			}
 		}
+
+		if (oi->contentp) {
+			if (oi->sizep && *oi->sizep < oi->content_limit) {
+				*oi->contentp = cache_or_unpack_entry(r, p, obj_offset,
+								      oi->sizep, &type);
+				if (!*oi->contentp)
+					type = OBJ_BAD;
+			} else {
+				*oi->contentp = NULL;
+			}
+		}
 	}
 
 	if (oi->disk_sizep) {

  parent reply	other threads:[~2024-07-15  0:35 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15  0:35 [PATCH v1 00/10] cat-file speedups Eric Wong
2024-07-15  0:35 ` [PATCH v1 01/10] packfile: move sizep computation Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-15  0:35 ` Eric Wong [this message]
2024-07-24  8:35   ` [PATCH v1 02/10] packfile: allow content-limit for cat-file Patrick Steinhardt
2024-07-26  7:30     ` Eric Wong
2024-07-15  0:35 ` [PATCH v1 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-26  7:43     ` Eric Wong
2024-07-15  0:35 ` [PATCH v1 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-07-15  0:35 ` [PATCH v1 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-07-24  8:35   ` Patrick Steinhardt
2024-07-26  7:42     ` Eric Wong
2024-08-18 17:36       ` assert vs BUG [was: [PATCH v1 05/10] cat-file: use delta_base_cache entries directly] Eric Wong
2024-08-19 15:50         ` Junio C Hamano
2024-07-15  0:35 ` [PATCH v1 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-07-24  8:36   ` Patrick Steinhardt
2024-07-26  8:01     ` Eric Wong
2024-07-15  0:35 ` [PATCH v1 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-07-15  0:35 ` [PATCH v1 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-07-15  0:35 ` [PATCH v1 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-07-15  0:35 ` [PATCH v1 10/10] cat-file: use writev(2) if available Eric Wong
2024-07-24  8:35 ` [PATCH v1 00/10] cat-file speedups Patrick Steinhardt
2024-08-23 22:46 ` [PATCH v2 " Eric Wong
2024-08-23 22:46   ` [PATCH v2 01/10] packfile: move sizep computation Eric Wong
2024-09-17 10:06     ` Taylor Blau
2024-08-23 22:46   ` [PATCH v2 02/10] packfile: allow content-limit for cat-file Eric Wong
2024-08-26 17:10     ` Junio C Hamano
2024-08-27 20:23       ` Eric Wong
2024-09-17 10:10         ` Taylor Blau
2024-09-17 21:15           ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 03/10] packfile: fix off-by-one in content_limit comparison Eric Wong
2024-08-26 16:55     ` Junio C Hamano
2024-09-17 10:11       ` Taylor Blau
2024-08-23 22:46   ` [PATCH v2 04/10] packfile: inline cache_or_unpack_entry Eric Wong
2024-08-26 17:09     ` Junio C Hamano
2024-10-06 17:40       ` Eric Wong
2024-08-23 22:46   ` [PATCH v2 05/10] cat-file: use delta_base_cache entries directly Eric Wong
2024-08-26 21:31     ` Junio C Hamano
2024-08-26 23:05       ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 06/10] packfile: packed_object_info avoids packed_to_object_type Eric Wong
2024-08-26 21:50     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 07/10] object_info: content_limit only applies to blobs Eric Wong
2024-08-26 22:02     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 08/10] cat-file: batch-command uses content_limit Eric Wong
2024-08-26 22:13     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 09/10] cat-file: batch_write: use size_t for length Eric Wong
2024-08-27  5:06     ` Junio C Hamano
2024-08-23 22:46   ` [PATCH v2 10/10] cat-file: use writev(2) if available Eric Wong
2024-08-27  5:41     ` Junio C Hamano
2024-08-27 15:43       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240715003519.2671385-3-e@80x24.org \
    --to=e@80x24$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox