public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks•im>
To: Karthik Nayak <karthik.188@gmail•com>
Cc: git@vger•kernel.org, Justin Tobler <jltobler@gmail•com>,
	Eric Sunshine <sunshine@sunshineco•com>
Subject: Re: [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs
Date: Tue, 6 Jan 2026 12:55:07 +0100	[thread overview]
Message-ID: <aVz4G-gRf-mA2N56@pks.im> (raw)
In-Reply-To: <CAOLa=ZSZ9PKCi=vQY8WKhwAHVZT-keA5XOXBMVrB4ZW+u2uNhg@mail.gmail.com>

On Tue, Jan 06, 2026 at 03:27:29AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks•im> writes:
> 
> > When performing auto-maintenance we check whether commit graphs need to
> > be generated by counting the number of commits that are reachable by any
> > reference, but not covered by a commit graph. This search is performed
> > by iterating through all references and then doing a depth-first search
> > until we have found enough commits that are not present in the commit
> > graph.
> >
> > This logic has a memory leak though:
> >
> >   Direct leak of 16 byte(s) in 1 object(s) allocated from:
> >       #0 0x55555562e433 in malloc (git+0xda433)
> >       #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
> >       #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
> >       #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
> >       #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
> >       #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
> >       #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
> >       #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
> >       #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
> >       #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
> >       #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
> >       #11 0x55555575166a in run_builtin ../git.c:506:11
> >       #12 0x5555557502f0 in handle_builtin ../git.c:779:9
> >       #13 0x555555751127 in run_argv ../git.c:862:4
> >       #14 0x55555575007b in cmd_main ../git.c:984:19
> >       #15 0x5555557523aa in main ../common-main.c:9:11
> >       #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
> >       #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2•2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
> >       #18 0x5555555f0934 in _start (git+0x9c934)
> >
> > The root cause of this memory leak is our use of `commit_list_append()`.
> > This function expects as parameters the item to append and the _tail_ of
> > the list to append. This tail will then be overwritten with the new tail
> > of the list so that it can be used in subsequent calls. But we call it
> > with `commit_list_append(parent->item, &stack)`, so we end up losing
> > everything but the new item.
> >
> > This issue only surfaces when counting merge commits. Next to being a
> > memory leak, it also shows that we're in fact miscounting as we only
> > respect children of the last parent. All previous parents are discarded,
> > so their children will be disregarded unless they are hit via another
> > reference.
> 
> Yikes. So we never go down the path of the first N-1 parents? Does that
> inversely mean, commit-graph generation would be slower now in
> repositories with lots of merges, since it is fixed to follow all paths
> correctly?

Yeah, this was quite broken indeed. I don't think that the fixed walk
should result in a significant slowdown:

  - We stop walking the parent chain whenever we see a commit that is
    covered by the commit-graph.

  - And our walk of commits that are not covered by the commit-graph is
    bounded by "maintenance.commit-graph.auto", which defaults to 100
    commits.

So in practice, the cost should be negligible.

There's going to be some exceptions thoulgh. Most importantly, the
complexity of the computation scales directly with the number of refs as
we use `refs_for_each_ref()`. So if you have a gazillion refs I'd expect
the performance impact to become noticeable. But that's already been the
case before this commit.

We may want to revisit this in the future if we ever notice that this
does become an issue.

> > While crafting a test case for the issue I was puzzled that I couldn't
> > establish the proper border at which the auto-condition would be
> > fulfilled. As it turns out, there's another bug: if an object is at the
> > tip of any reference we don't mark it as seen. Consequently, if it is
> > reachable via any other reference, we'd count that object twice.
> >
> 
> So if an object is at the tip of N references, we'd count it N times
> right?

Yeah, exactly. Let me rephrase that slightly.

> > diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> > index 6b36f52df7..a2b4403595 100755
> > --- a/t/t7900-maintenance.sh
> > +++ b/t/t7900-maintenance.sh
> > @@ -206,6 +206,31 @@ test_expect_success 'commit-graph auto condition' '
> >  	test_subcommand $COMMIT_GRAPH_WRITE <cg-two-satisfied.txt
> >  '
> >
> > +test_expect_success 'commit-graph auto condition with merges' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		git config set maintenance.auto false &&
> > +		git commit --allow-empty -m initial &&
> > +		git switch --create feature &&
> > +		git commit --allow-empty -m feature-1 &&
> > +		git commit --allow-empty -m feature-2 &&
> > +		git switch - &&
> > +		git commit --allow-empty -m main-1 &&
> > +		git commit --allow-empty -m main-2 &&
> > +		git merge feature &&
> > +
> > +		# We have 6 commit, none of which are covered by a commit
> > +		# graph. So this must be the boundary at which we start to
> > +		# perform maintenance.
> > +		test_must_fail git -c maintenance.commit-graph.auto=7 \
> > +			maintenance is-needed --auto --task=commit-graph &&
> > +		git -c maintenance.commit-graph.auto=6 \
> > +			maintenance is-needed --auto --task=commit-graph
> > +	)
> > +'
> > +
> 
> This doesn't test the fix around double-counting tip objects, no?

Hm, true. I initially used `test_commit()` here, which uncovered the
double-counting as it creates lightweight tags by default. Let me revert
back to use that function so that we exercise this again.

Thanks!

Patrick

  reply	other threads:[~2026-01-06 11:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05  8:19 [PATCH 0/3] Some random object database related fixes Patrick Steinhardt
2025-12-05  8:19 ` [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes Patrick Steinhardt
2025-12-10 19:31   ` Justin Tobler
2025-12-11  5:46     ` Patrick Steinhardt
2025-12-05  8:19 ` [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
2025-12-10 19:49   ` Justin Tobler
2025-12-11  5:48     ` Patrick Steinhardt
2025-12-05  8:20 ` [PATCH 3/3] odb: properly close sources before freeing them Patrick Steinhardt
2025-12-05 23:14   ` Eric Sunshine
2025-12-06 11:38     ` Patrick Steinhardt
2025-12-06 11:43       ` Eric Sunshine
2025-12-06 12:04         ` Patrick Steinhardt
2025-12-11  7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
2025-12-11  7:19   ` [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
2025-12-11 14:16     ` Toon Claes
2026-01-06 11:27     ` Karthik Nayak
2026-01-06 11:55       ` Patrick Steinhardt [this message]
2026-01-06 16:33         ` Karthik Nayak
2025-12-11  7:19   ` [PATCH v2 2/2] odb: properly close sources before freeing them Patrick Steinhardt
2025-12-12 15:01   ` [PATCH v2 0/2] Some random object database related fixes Justin Tobler
2026-01-06 11:29   ` Karthik Nayak
2026-01-06 12:58 ` [PATCH v3 " Patrick Steinhardt
2026-01-06 12:58   ` [PATCH v3 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
2026-01-06 12:58   ` [PATCH v3 2/2] odb: properly close sources before freeing them Patrick Steinhardt
2026-01-06 16:30   ` [PATCH v3 0/2] Some random object database related fixes Karthik Nayak
2026-01-07  3:51     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVz4G-gRf-mA2N56@pks.im \
    --to=ps@pks$(echo .)im \
    --cc=git@vger$(echo .)kernel.org \
    --cc=jltobler@gmail$(echo .)com \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=sunshine@sunshineco$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox