From: Jeff King <peff@peff•net>
To: "René Scharfe" <l.s.r@web•de>
Cc: Git List <git@vger•kernel.org>
Subject: Re: [PATCH 1/3] commit: convert pop_most_recent_commit() to prio_queue
Date: Sat, 19 Jul 2025 02:55:58 -0400 [thread overview]
Message-ID: <20250719065558.GD705356@coredump.intra.peff.net> (raw)
In-Reply-To: <b0950e32-b4fa-4aff-8b5c-58c734b880b2@web.de>
On Wed, Jul 16, 2025 at 11:39:49AM +0200, René Scharfe wrote:
> On 7/16/25 7:05 AM, Jeff King wrote:
> > On Tue, Jul 15, 2025 at 04:51:07PM +0200, René Scharfe wrote:
> >
> >> pop_most_recent_commit() calls commit_list_insert_by_date(), which and
> >> is itself called in a loop, which can lead to quadratic complexity.
> >> Replace the commit_list with a prio_queue to ensure logarithmic worst
> >> case complexity and convert all three users.
> >
> > I guess I'm cc'd because of my frequent complains about the quadratic
> > nature of our commit lists? :)
>
> And because you introduced prio_queue.
I think that was Junio, but I think I can be counted as a cheerleader
for the topic. :)
> > Mostly I asked because I had to look at pop_most_recent_commit() to see
> > what operation would be made faster here. Looks like it's mostly ":/",
> > but maybe also fetch's mark_recent_complete_commits()? I guess we might
> > hit that if you have a huge number of refs?
>
> The :/ handling was the easiest to test for me. fetch_pack() and
> walker_fetch() require some server side to set up, which seems not worth
> it just to demonstrate quadratic behavior. Having thousands of refs
> would make the list long enough to notice, as would having lots of
> merges.
OK, that makes sense. Just making sure I understand the benefits.
> My general idea is to get rid of commit_list_insert_by_date() eventually
> to avoid quadratic complexity.
Yeah, it's certainly at the root of many such problems we've seen over
the years.
> > I actually have a series turning rev_info.commits into a prio_queue
> > which I need to polish up (mostly just writing commit messages; I've
> > been running with it for almost 2 years without trouble). Ironically it
> > does not touch this spot, as these commit lists are formed on their own.
>
> That is not a coincidence. I had a look at that series and tried to
> reach its goals while keeping rev_info.commits a commit_list. Why?
> Mostly being vaguely uncomfortable with prio_queue' memory overhead,
> lack of type safety and dual use as a stack. I still used it, but only
> as local variable, not in the central struct rev_info.
Hmm, I would have thought prio_queue had less memory overhead. You're
spending one pointer per entry in a packed array, versus list nodes. But
it's true that it doesn't shrink as items are removed (though that is
something we _could_ implement).
The dual use as a stack actually came in handy for my series, IIRC.
There are spots which use a commit_list but care about a specific order,
and my list/prio_queue conversion helpers use that to create a non-heap
prio_queue that just returns the items in the original order (it's
actually FIFO, but we can get that by reversing).
I dunno. That's kind of horrible when I say it out loud, but it did make
things work. I'm surprised that your attempt ended up with a performance
hit when mine did not. Mine tried not to be clever, and even leaves in
place a few spots where we convert between the two representations to
satisfy various interfaces (with the goal that we'd probably eventually
switch to prio_queue everywhere).
-Peff
next prev parent reply other threads:[~2025-07-19 6:56 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 14:35 [PATCH 0/3] commit: convert pop_most_recent_commit() to prio_queue René Scharfe
2025-07-15 14:51 ` [PATCH 1/3] " René Scharfe
2025-07-15 19:23 ` Junio C Hamano
2025-07-15 20:47 ` Justin Tobler
2025-07-16 9:39 ` René Scharfe
2025-07-16 5:05 ` Jeff King
2025-07-16 9:39 ` René Scharfe
2025-07-17 8:22 ` René Scharfe
2025-07-19 6:55 ` Jeff King [this message]
2025-07-19 6:57 ` Jeff King
2025-07-19 11:15 ` René Scharfe
2025-07-20 0:03 ` Jeff King
2025-07-20 1:22 ` Junio C Hamano
2025-07-16 22:23 ` Junio C Hamano
2025-07-17 8:22 ` René Scharfe
2025-07-15 14:51 ` [PATCH 2/3] prio-queue: add prio_queue_replace() René Scharfe
2025-07-16 5:09 ` Jeff King
2025-07-16 9:38 ` René Scharfe
2025-07-17 9:20 ` René Scharfe
2025-07-19 7:02 ` Jeff King
2025-07-15 14:51 ` [PATCH 3/3] commit: use prio_queue_replace() in pop_most_recent_commit() René Scharfe
2025-07-15 20:43 ` Junio C Hamano
2025-07-16 9:38 ` René Scharfe
2025-07-16 0:07 ` [PATCH 0/3] commit: convert pop_most_recent_commit() to prio_queue Junio C Hamano
2025-07-16 5:15 ` Jeff King
2025-07-16 9:38 ` René Scharfe
2025-07-19 6:45 ` Jeff King
2025-07-16 14:49 ` Junio C Hamano
2025-07-18 9:09 ` [PATCH v2 " René Scharfe
2025-07-18 9:39 ` [PATCH v2 1/3] " René Scharfe
2025-07-21 14:02 ` Lidong Yan
2025-08-03 9:54 ` René Scharfe
2025-08-03 16:48 ` Junio C Hamano
2025-08-04 19:56 ` René Scharfe
2025-07-18 9:39 ` [PATCH v2 3/3] commit: use prio_queue_replace() in pop_most_recent_commit(),MIME-Version: 1.0 René Scharfe
2025-08-03 11:12 ` Johannes Schindelin
2025-08-03 11:33 ` René Scharfe
2025-07-18 9:39 ` [PATCH v2 2/3] prio-queue: add prio_queue_replace() René Scharfe
2025-07-19 7:04 ` [PATCH v2 0/3] commit: convert pop_most_recent_commit() to prio_queue Jeff King
2025-07-22 6:26 ` SZEDER Gábor
2025-07-22 14:27 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250719065558.GD705356@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=l.s.r@web$(echo .)de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox