public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Martin Fick <mfick@nvidia•com>
Cc: Patrick Steinhardt <ps@pks•im>,
	"brian m. carlson" <sandals@crustytoothpaste•net>,
	"git@vger•kernel.org" <git@vger•kernel.org>
Subject: Re: Slow git pack-refs --all
Date: Thu, 15 Jan 2026 16:09:08 -0500	[thread overview]
Message-ID: <20260115210908.GE1053259@coredump.intra.peff.net> (raw)
In-Reply-To: <CH3PR12MB9026C8C940270F02CEF83C4FC284A@CH3PR12MB9026.namprd12.prod.outlook.com>

On Wed, Jan 07, 2026 at 10:58:36PM +0000, Martin Fick wrote:

> I ran perf, and got a flame graph, I am not sure what the best way to share that
> is, but I will try to summarize what looked important:
> 
> About one third of the time is in this section:
> 
> libc-2.17.so 32.5%
>  _memcmp_sse4_1 29.8%
>  page_fault 7.23%
>  ...
> 
> I am not really sure what that is doing?

Probably this is the call to strcmp(iter->ref.name, update->refname) in
packed-backend.c:write_with_updates().

We have to write out the new packed-refs file with our updates in sorted
order. So it's a big O(n) merge between the existing ones (from the
"iter" side) and the new ones (from the "update" side).

It could also be caused by sorting of the packed-refs entries. We
generally shouldn't need to do that, but I think I may have found
something useful. See below.

> unpack_object_header_buffer 30%
>  page_fault 26.9%
>  ...
>  nfs_read_page 10%
> 
> Which could very well be looking at the headers of objects to see if they are 
> tags needing to be peeled?

Yeah, that's what I'd expect here.

> And the remaining third was a bit all over the place with small sections,
> the largest two of those sections being:
> 
> packed_refs_store_create ~8.7%
>  unknown 4.4%
>  memchr 4.4%
>  page_fault 4.4%

Hmm, I don't think we have a function "packed_refs_store_create". Did
you typo while transferring the name over?

At any rate, we can assume this is poking through the packed-refs file
itself, looking for trailing newlines via memchr.

But why would we do that immediately when creating the packed-refs store
in memory? In modern versions of Git, we try to avoid reading the
packed-refs file as much as possible, binary-searching when we can. Of
course that means it has to be sorted, which was not something promised
by the original format. So we have a "sorted" tag that we write. E.g.,
this is from my clone of git, packed with git itself:

  $ head -n 1 .git/packed-refs
  # pack-refs with: peeled fully-peeled sorted

Now let's try something with jgit:

  git init
  git commit --allow-empty -m foo
  git branch foo
  git branch bar

  jgit pack-refs --all
  cat .git/packed-refs

That gives me this:

  # pack-refs with: peeled
  86054aaedc64c24aec8aaad988f6979a3cb82ee0 refs/heads/bar
  86054aaedc64c24aec8aaad988f6979a3cb82ee0 refs/heads/foo
  86054aaedc64c24aec8aaad988f6979a3cb82ee0 refs/heads/main

Aha! So jgit is not writing out the "sorted" tag. As a result, when git
reads the file, its logic is:

  1. Check for the sorted tag. It's not here, so...

  2. Check if the file is sorted by reading each entry linearly. If it's
     not, then...

  3. Read it all into memory and sort the result. We can then
     binary-search that (and iterate it in sorted order, which is
     important for pack-refs).

So when git reads the packed-refs file, we are ending up at least with
step 2, an extra pass through the whole file, and maybe to step 3
(depending on whether jgit actually sorts the file).

You mentioned that Gerrit writes the packed-refs file directly itself,
presumably using jgit. So it sounds like it is constantly undoing Git's
"sorted" marker, which causes git-pack-refs to spend extra effort
checking the sortedness, and rewrite the marker, which then gets hosed
again by jgit, and so on.

And that may explain why jgit is faster, if it is not doing the extra
sort check. If it is not even trying to maintain the sorted property
that it would be faster still (it takes one linear pass while writing
out the file, omitting entries that match our updates, and then appends
our updates at the end).

If jgit _is_ sorting the file but not writing out the sorted marker,
then it should start doing so. ;)

If it's not sorting the file, then probably it should start doing so
(and writing the marker). This will make subsequent reads much faster
(mmap + binary-search). It shouldn't even be slower to write (assuming
jgit's writes are doing the usual "rewrite the whole thing to a tempfile
and atomic-rename into place", and not taking some shortcut by appending
to the file).

Unrelated to your problem, but also jgit should support the fully-peeled
tag, another thing that makes readers faster. ;)

The jgit version I'm using is:

  $ jgit version
  jgit version 7.5.0.202512021534-r

One way you could test this theory is to sort and mark the file
yourself, before running "git pack-refs". One easy way to do that is to
convince git to rewrite it by removing an entry. I.e., find some ref
mentioned in the pack-refs file, and then "git update-ref -d $ref". And
check out the first line of .git/packed-refs before and after. If it
goes faster (and similarly fast to jgit) only when the "sorted" tag
appears, then that would be our culprit.

-Peff

  parent reply	other threads:[~2026-01-15 21:09 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-25 22:13 Slow git pack-refs --all Martin Fick
2025-12-25 23:38 ` brian m. carlson
2025-12-26  4:45   ` Jeff King
2025-12-26 17:15     ` brian m. carlson
2025-12-27  7:36       ` Jeff King
2025-12-31  5:48     ` Martin Fick
2026-01-02  7:49       ` Jeff King
2026-01-05 23:45         ` Martin Fick
2026-01-06  6:53           ` Patrick Steinhardt
2026-01-06 23:02             ` Martin Fick
2026-01-07 11:42               ` Patrick Steinhardt
2026-01-07 22:58                 ` Martin Fick
2026-01-08  6:33                   ` Patrick Steinhardt
2026-01-15 21:09                   ` Jeff King [this message]
2026-01-16 20:35                     ` Martin Fick
2026-01-07 17:05             ` Martin Fick
2026-01-06 10:38           ` Jeff King
2026-01-06 23:03             ` Martin Fick
2025-12-31  5:39   ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260115210908.GE1053259@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=mfick@nvidia$(echo .)com \
    --cc=ps@pks$(echo .)im \
    --cc=sandals@crustytoothpaste$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox