public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Patrick Steinhardt <ps@pks•im>
Cc: git@vger•kernel.org, Matt Smiley <msmiley@gitlab•com>,
	"brian m. carlson" <sandals@crustytoothpaste•net>,
	Junio C Hamano <gitster@pobox•com>
Subject: Re: [PATCH 2/2] upload-pack: reduce lock contention when writing packfile data
Date: Fri, 27 Feb 2026 14:37:58 -0500	[thread overview]
Message-ID: <20260227193758.GA2931515@coredump.intra.peff.net> (raw)
In-Reply-To: <20260227-pks-upload-pack-write-contention-v1-2-7166fe255704@pks.im>

On Fri, Feb 27, 2026 at 12:23:01PM +0100, Patrick Steinhardt wrote:

> Extend our use of the buffering infrastructure so that we soak up bytes
> until the buffer is filled up at least 2/3rds of its capacity. The
> change is relatively simple to implement as we already know to flush the
> buffer in `create_pack_file()` after git-pack-objects(1) has finished.

We are relaying write() calls from pack-objects here, which is writing
to us in 8kb chunks (due to csum-file.c buffering). So most of our
writes will be 8k.

Rather than buffering in upload-pack, would it not be simpler to just
increase the write size from pack-objects? Then we do not have to worry
about disrupting upload-pack's keepalive timeouts. And as a bonus, if
you are worried about the system-wide number of calls, you will likewise
be reducing the number of read() and write() calls over the pipe between
pack-objects and upload-pack.

Something like this:

diff --git a/csum-file.c b/csum-file.c
index 6e21e3cac8..94798fa429 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -206,7 +206,7 @@ struct hashfile *hashfd_throughput(const struct git_hash_algo *algop,
 	 * size so the progress indicators arrive at a more
 	 * frequent rate.
 	 */
-	return hashfd_internal(algop, fd, name, tp, 8 * 1024);
+	return hashfd_internal(algop, fd, name, tp, 32 * 1024);
 }
 
 void hashfile_checkpoint_init(struct hashfile *f,

reduces the number of write calls reported by:

  git clone \
    --upload-pack='perf stat -e syscalls:sys_enter_write git-upload-pack' \
    --bare --no-local linux.git foo.git

from ~420k to ~160k. In theory we expect ~8x reduction in our target
area, 4x for each of pack-objects and upload-pack, but of course there
are other writes going on, too, including the extra sideband ones. And
obviously we could push it further towards LARGE_PACKET_MAX to save even
more.

> Now git-upload-pack(1) already has the infrastructure in place to buffer
> some of the data it reads from git-pack-objects(1) before actually
> sending it out. We only use this infrastructure in very limited ways
> though, so we generally end up matching one read(3p) call with one
> write(3p) call. Even worse, when the sideband is enabled we end up
> matching one read with _two_ writes: one for the pkt-line length, and
> one for the packfile data.

Using writev() would be an easy-ish fix here, modulo portability
concerns (though of course it is easy to implement a fallback writev()
in terms of write()). Doing this:

diff --git a/sideband.c b/sideband.c
index ea7c25211e..b5509fbaa2 100644
--- a/sideband.c
+++ b/sideband.c
@@ -266,19 +266,25 @@ void send_sideband(int fd, int band, const char *data, ssize_t sz, int packet_ma
 	while (sz) {
 		unsigned n;
 		char hdr[5];
+		struct iovec iov[2];
 
 		n = sz;
 		if (packet_max - 5 < n)
 			n = packet_max - 5;
 		if (0 <= band) {
 			xsnprintf(hdr, sizeof(hdr), "%04x", n + 5);
 			hdr[4] = band;
-			write_or_die(fd, hdr, 5);
+			iov[0].iov_base = hdr;
+			iov[0].iov_len = 5;
 		} else {
 			xsnprintf(hdr, sizeof(hdr), "%04x", n + 4);
-			write_or_die(fd, hdr, 4);
+			iov[0].iov_base = hdr;
+			iov[0].iov_len = 4;
 		}
-		write_or_die(fd, p, n);
+		iov[1].iov_base = p;
+		iov[1].iov_len = n;
+		/* obviously needs looping and error detection */
+		writev(fd, iov, 2);
 		p += n;
 		sz -= n;
 	}

drops my 160k write calls down to 82k.

Another option here is teaching the packet-forming code to reserve a few
bytes at the front of the packet. There's a little discussion here:

  https://lore.kernel.org/git/YBkeYSA5UfQP1m%2Fx@coredump.intra.peff.net/

In theory it's easy and elegant to do, but I'm not sure what the
refactoring fallout would be like.

> This significantly reduces the number of write(3p) syscalls we need to
> do. Before this change, cloning the Linux repository resulted in around
> 400,000 write(3p) syscalls. With the buffering in place we only do
> around 130,000 syscalls.

Out of curiosity, how did you end up measuring? I first tried with
strace (without "-f") on the upload-pack process, but strace slowed it
enough that it ended up collecting multiple of pack-object's 8k write()
calls in a single read() call. ;) The "perf stat" above seemed to work
OK, though of course it's counting child processes, too.

-Peff

  parent reply	other threads:[~2026-02-27 19:38 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27 11:22 [PATCH 0/2] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-02-27 11:23 ` [PATCH 1/2] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-02-27 11:23 ` [PATCH 2/2] upload-pack: reduce lock contention when writing " Patrick Steinhardt
2026-02-27 13:04   ` brian m. carlson
2026-02-27 18:14     ` Patrick Steinhardt
2026-02-27 17:29   ` Junio C Hamano
2026-02-27 19:37   ` Jeff King [this message]
2026-03-02 12:12     ` Patrick Steinhardt
2026-03-02 18:20       ` Jeff King
2026-03-03  9:31         ` Patrick Steinhardt
2026-03-03 13:35           ` Jeff King
2026-03-03 13:47             ` Patrick Steinhardt
2026-03-03 15:00 ` [PATCH v2 00/10] " Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 01/10] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 02/10] upload-pack: adapt keepalives based on buffering Patrick Steinhardt
2026-03-05  0:56     ` Jeff King
2026-03-10 12:08       ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 03/10] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-05  1:16     ` Jeff King
2026-03-10 12:14       ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 04/10] git-compat-util: introduce `cast_size_t_to_ssize_t()` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 05/10] compat/posix: introduce writev(3p) wrapper Patrick Steinhardt
2026-03-04 22:01     ` Junio C Hamano
2026-03-05  0:37       ` Jeff King
2026-03-05  2:16         ` brian m. carlson
2026-03-05  6:39           ` Johannes Sixt
2026-03-05 22:22             ` brian m. carlson
2026-03-10 12:09               ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 06/10] wrapper: introduce writev(3p) wrappers Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 07/10] sideband: use writev(3p) to send pktlines Patrick Steinhardt
2026-03-04 22:05     ` Junio C Hamano
2026-03-03 15:00   ` [PATCH v2 08/10] csum-file: introduce `hashfd_ext()` Patrick Steinhardt
2026-03-04 22:11     ` Junio C Hamano
2026-03-10 12:09       ` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 09/10] csum-file: drop `hashfd_throughput()` Patrick Steinhardt
2026-03-03 15:00   ` [PATCH v2 10/10] builtin/pack-objects: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-10 13:24 ` [PATCH v3 00/10] upload-pack: " Patrick Steinhardt
2026-03-10 13:24   ` [PATCH v3 01/10] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-03-10 13:24   ` [PATCH v3 02/10] upload-pack: adapt keepalives based on buffering Patrick Steinhardt
2026-03-10 13:24   ` [PATCH v3 03/10] upload-pack: prefer flushing data over sending keepalive Patrick Steinhardt
2026-03-10 17:09     ` Junio C Hamano
2026-03-10 17:43       ` Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 04/10] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 05/10] compat/posix: introduce writev(3p) wrapper Patrick Steinhardt
2026-03-10 16:59     ` Junio C Hamano
2026-03-10 13:25   ` [PATCH v3 06/10] wrapper: introduce writev(3p) wrappers Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 07/10] sideband: use writev(3p) to send pktlines Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 08/10] csum-file: introduce `hashfd_ext()` Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 09/10] csum-file: drop `hashfd_throughput()` Patrick Steinhardt
2026-03-10 13:25   ` [PATCH v3 10/10] builtin/pack-objects: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-10 17:11   ` [PATCH v3 00/10] upload-pack: " Junio C Hamano
2026-03-10 20:56   ` Johannes Sixt
2026-03-11  6:27     ` Patrick Steinhardt
2026-03-13  6:45 ` [PATCH v4 " Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 01/10] upload-pack: fix debug statement when flushing " Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 02/10] upload-pack: adapt keepalives based on buffering Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 03/10] upload-pack: prefer flushing data over sending keepalive Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 04/10] upload-pack: reduce lock contention when writing packfile data Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 05/10] compat/posix: introduce writev(3p) wrapper Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 06/10] wrapper: introduce writev(3p) wrappers Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 07/10] sideband: use writev(3p) to send pktlines Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 08/10] csum-file: introduce `hashfd_ext()` Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 09/10] csum-file: drop `hashfd_throughput()` Patrick Steinhardt
2026-03-13  6:45   ` [PATCH v4 10/10] builtin/pack-objects: reduce lock contention when writing packfile data Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260227193758.GA2931515@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=msmiley@gitlab$(echo .)com \
    --cc=ps@pks$(echo .)im \
    --cc=sandals@crustytoothpaste$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox