public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks•im>
To: git@vger•kernel.org
Cc: Justin Tobler <jltobler@gmail•com>,
	 Eric Sunshine <sunshine@sunshineco•com>
Subject: [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs
Date: Thu, 11 Dec 2025 08:19:58 +0100	[thread overview]
Message-ID: <20251211-odb-related-fixes-v2-1-bdf875ce51fc@pks.im> (raw)
In-Reply-To: <20251211-odb-related-fixes-v2-0-bdf875ce51fc@pks.im>

When performing auto-maintenance we check whether commit graphs need to
be generated by counting the number of commits that are reachable by any
reference, but not covered by a commit graph. This search is performed
by iterating through all references and then doing a depth-first search
until we have found enough commits that are not present in the commit
graph.

This logic has a memory leak though:

  Direct leak of 16 byte(s) in 1 object(s) allocated from:
      #0 0x55555562e433 in malloc (git+0xda433)
      #1 0x555555964322 in do_xmalloc ../wrapper.c:55:8
      #2 0x5555559642e6 in xmalloc ../wrapper.c:76:9
      #3 0x55555579bf29 in commit_list_append ../commit.c:1872:35
      #4 0x55555569f160 in dfs_on_ref ../builtin/gc.c:1165:4
      #5 0x5555558c33fd in do_for_each_ref_iterator ../refs/iterator.c:431:12
      #6 0x5555558af520 in do_for_each_ref ../refs.c:1828:9
      #7 0x5555558ac317 in refs_for_each_ref ../refs.c:1833:9
      #8 0x55555569e207 in should_write_commit_graph ../builtin/gc.c:1188:11
      #9 0x55555569c915 in maintenance_is_needed ../builtin/gc.c:3492:8
      #10 0x55555569b76a in cmd_maintenance ../builtin/gc.c:3542:9
      #11 0x55555575166a in run_builtin ../git.c:506:11
      #12 0x5555557502f0 in handle_builtin ../git.c:779:9
      #13 0x555555751127 in run_argv ../git.c:862:4
      #14 0x55555575007b in cmd_main ../git.c:984:19
      #15 0x5555557523aa in main ../common-main.c:9:11
      #16 0x7ffff7a2a4d7 in __libc_start_call_main (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a4d7) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
      #17 0x7ffff7a2a59a in __libc_start_main@GLIBC_2•2.5 (/nix/store/xx7cm72qy2c0643cm1ipngd87aqwkcdp-glibc-2.40-66/lib/libc.so.6+0x2a59a) (BuildId: cddea92d6cba8333be952b5a02fd47d61054c5ab)
      #18 0x5555555f0934 in _start (git+0x9c934)

The root cause of this memory leak is our use of `commit_list_append()`.
This function expects as parameters the item to append and the _tail_ of
the list to append. This tail will then be overwritten with the new tail
of the list so that it can be used in subsequent calls. But we call it
with `commit_list_append(parent->item, &stack)`, so we end up losing
everything but the new item.

This issue only surfaces when counting merge commits. Next to being a
memory leak, it also shows that we're in fact miscounting as we only
respect children of the last parent. All previous parents are discarded,
so their children will be disregarded unless they are hit via another
reference.

While crafting a test case for the issue I was puzzled that I couldn't
establish the proper border at which the auto-condition would be
fulfilled. As it turns out, there's another bug: if an object is at the
tip of any reference we don't mark it as seen. Consequently, if it is
reachable via any other reference, we'd count that object twice.

Fix both of these bugs so that we properly count objects without leaking
any memory.

Signed-off-by: Patrick Steinhardt <ps@pks•im>
---
 builtin/gc.c           |  8 +++++---
 t/t7900-maintenance.sh | 25 +++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 92c6e7b954..17ff68cbd9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1130,8 +1130,10 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
 		return 0;
 
 	commit = lookup_commit(the_repository, maybe_peeled);
-	if (!commit)
+	if (!commit || commit->object.flags & SEEN)
 		return 0;
+	commit->object.flags |= SEEN;
+
 	if (repo_parse_commit(the_repository, commit) ||
 	    commit_graph_position(commit) != COMMIT_NOT_FROM_GRAPH)
 		return 0;
@@ -1141,7 +1143,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
 	if (data->num_not_in_graph >= data->limit)
 		return 1;
 
-	commit_list_append(commit, &stack);
+	commit_list_insert(commit, &stack);
 
 	while (!result && stack) {
 		struct commit_list *parent;
@@ -1162,7 +1164,7 @@ static int dfs_on_ref(const struct reference *ref, void *cb_data)
 				break;
 			}
 
-			commit_list_append(parent->item, &stack);
+			commit_list_insert(parent->item, &stack);
 		}
 	}
 
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 6b36f52df7..a2b4403595 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -206,6 +206,31 @@ test_expect_success 'commit-graph auto condition' '
 	test_subcommand $COMMIT_GRAPH_WRITE <cg-two-satisfied.txt
 '
 
+test_expect_success 'commit-graph auto condition with merges' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		git config set maintenance.auto false &&
+		git commit --allow-empty -m initial &&
+		git switch --create feature &&
+		git commit --allow-empty -m feature-1 &&
+		git commit --allow-empty -m feature-2 &&
+		git switch - &&
+		git commit --allow-empty -m main-1 &&
+		git commit --allow-empty -m main-2 &&
+		git merge feature &&
+
+		# We have 6 commit, none of which are covered by a commit
+		# graph. So this must be the boundary at which we start to
+		# perform maintenance.
+		test_must_fail git -c maintenance.commit-graph.auto=7 \
+			maintenance is-needed --auto --task=commit-graph &&
+		git -c maintenance.commit-graph.auto=6 \
+			maintenance is-needed --auto --task=commit-graph
+	)
+'
+
 test_expect_success 'run --task=bogus' '
 	test_must_fail git maintenance run --task=bogus 2>err &&
 	test_grep "is not a valid task" err

-- 
2.52.0.270.g3f4935d65f.dirty


  reply	other threads:[~2025-12-11  7:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05  8:19 [PATCH 0/3] Some random object database related fixes Patrick Steinhardt
2025-12-05  8:19 ` [PATCH 1/3] builtin/repack: fix geometric repacks with promisor remotes Patrick Steinhardt
2025-12-10 19:31   ` Justin Tobler
2025-12-11  5:46     ` Patrick Steinhardt
2025-12-05  8:19 ` [PATCH 2/3] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
2025-12-10 19:49   ` Justin Tobler
2025-12-11  5:48     ` Patrick Steinhardt
2025-12-05  8:20 ` [PATCH 3/3] odb: properly close sources before freeing them Patrick Steinhardt
2025-12-05 23:14   ` Eric Sunshine
2025-12-06 11:38     ` Patrick Steinhardt
2025-12-06 11:43       ` Eric Sunshine
2025-12-06 12:04         ` Patrick Steinhardt
2025-12-11  7:19 ` [PATCH v2 0/2] Some random object database related fixes Patrick Steinhardt
2025-12-11  7:19   ` Patrick Steinhardt [this message]
2025-12-11 14:16     ` [PATCH v2 1/2] builtin/gc: fix condition for whether to write commit graphs Toon Claes
2026-01-06 11:27     ` Karthik Nayak
2026-01-06 11:55       ` Patrick Steinhardt
2026-01-06 16:33         ` Karthik Nayak
2025-12-11  7:19   ` [PATCH v2 2/2] odb: properly close sources before freeing them Patrick Steinhardt
2025-12-12 15:01   ` [PATCH v2 0/2] Some random object database related fixes Justin Tobler
2026-01-06 11:29   ` Karthik Nayak
2026-01-06 12:58 ` [PATCH v3 " Patrick Steinhardt
2026-01-06 12:58   ` [PATCH v3 1/2] builtin/gc: fix condition for whether to write commit graphs Patrick Steinhardt
2026-01-06 12:58   ` [PATCH v3 2/2] odb: properly close sources before freeing them Patrick Steinhardt
2026-01-06 16:30   ` [PATCH v3 0/2] Some random object database related fixes Karthik Nayak
2026-01-07  3:51     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251211-odb-related-fixes-v2-1-bdf875ce51fc@pks.im \
    --to=ps@pks$(echo .)im \
    --cc=git@vger$(echo .)kernel.org \
    --cc=jltobler@gmail$(echo .)com \
    --cc=sunshine@sunshineco$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox