From: Jeff King <peff@peff•net>
To: "brian m. carlson" <sandals@crustytoothpaste•net>
Cc: git@vger•kernel.org, Junio C Hamano <gitster@pobox•com>,
Martin Wilck <mwilck@suse•com>,
Adrian Schroeter <adrian@suse•com>
Subject: Re: [PATCH] object-file: disallow adding submodules of different hash algo
Date: Wed, 12 Nov 2025 22:26:19 -0500 [thread overview]
Message-ID: <20251113032619.GA1739649@coredump.intra.peff.net> (raw)
In-Reply-To: <20251112235434.1499699-1-sandals@crustytoothpaste.net>
On Wed, Nov 12, 2025 at 11:54:34PM +0000, brian m. carlson wrote:
> Since this cannot work in the general case, restrict adding a submodule
> of a different algorithm to the index. Add tests for git add and git
> submodule add that these are rejected.
This makes sense. I had meant to follow up on our conversation and patch
from last month, but it was still on my todo list. Fortunately that
earlier attempt gives me something concrete to compare to. ;)
> diff --git a/object-file.c b/object-file.c
> index 4675c8ed6b..8c43c52ed0 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1661,7 +1661,11 @@ int index_path(struct index_state *istate, struct object_id *oid,
> strbuf_release(&sb);
> break;
> case S_IFDIR:
> - return repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid);
> + if (repo_resolve_gitlink_ref(istate->repo, path, "HEAD", oid))
> + return -1;
> + if (&hash_algos[oid->algo] != istate->repo->hash_algo)
> + return error(_("cannot add a submodule of a different hash algorithm"));
> + break;
> default:
> return error(_("%s: unsupported file type"), path);
> }
OK, you're checking for it here in index_path(), whereas my earlier
attempt did it in add_to_index(). For the most part, I think your spot
makes more sense, as it is at a lower level. add_to_index() eventually
calls into index_path(), and so do some other code paths.
That does leave two interesting oddities:
1. In add_to_index(), we have this code:
if (S_ISDIR(st_mode)) {
if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
return error(_("'%s' does not have a commit checked out"), path);
while (namelen && path[namelen-1] == '/')
namelen--;
}
which is run before we hit index_path(). So it may get an oid
result with an unexpected hash. I think that's OK, because nobody
ever looks at it (which would be a lot more obvious if we declared
the variable inside the conditional block here).
This whole lookup does feel a little funny and redundant. It comes
from f937bc2f86 (add: error appropriately on repository with no
commits, 2019-04-09), and the main goal is making the error message
better. But should we just improve the error message from
index_path() for this case (in which case the resolve call above go
away)?
I think this is mostly orthogonal to your patch and we can ignore
it for now. I only bring it up because now it's weird that we are
trying to catch the hash mismatch, but have this unchecked extra
resolve.
2. There are paths in add_to_index() that _don't_ hit index_path(). In
particular, intent-to-add entries. So with your patch, even though
a regular "git add" is forbidden:
$ git add repo
error: cannot add a submodule of a different hash algorithm
error: unable to index file 'repo'
fatal: updating files failed
I can still do this:
$ git add -N repo
warning: adding embedded git repository: repo
$ git ls-files -s
160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 repo
which skips the hash check entirely. Which kind of makes sense,
because the resulting index entry does not have a real oid in it at
all (it gets the empty blob oid). But it does have a real 160000
mode.
Can we make things worse from there? If we try to update it, for
example, that will fail:
$ git add -u
error: cannot add a submodule of a different hash algorithm
error: unable to index file 'repo'
fatal: updating files failed
So...maybe this is OK?
> +test_expect_success 'cannot add a submodule of a different algorithm' '
> + git init --object-format=sha256 sha256 &&
> + (
> + cd sha256 &&
> + test_commit abc &&
> + git init --object-format=sha1 submodule &&
> + (
> + cd submodule &&
> + test_commit def
> + ) &&
> + test_must_fail git add submodule &&
> + test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
> + ) &&
Makes sense. Purists might complain about "git ls-files" on the left
hand side of a pipe, but I think it is OK here. Though you can golf away
a few subprocesses at the same time with:
diff --git a/t/t3700-add.sh b/t/t3700-add.sh
index b075eb9b11..6a1d4e2659 100755
--- a/t/t3700-add.sh
+++ b/t/t3700-add.sh
@@ -547,12 +547,10 @@ test_expect_success 'cannot add a submodule of a different algorithm' '
cd sha256 &&
test_commit abc &&
git init --object-format=sha1 submodule &&
- (
- cd submodule &&
- test_commit def
- ) &&
+ test_commit -C submodule def &&
test_must_fail git add submodule &&
- test $(git ls-files --stage | grep ^160000 | wc -l) -eq 0
+ git ls-files --stage >entries &&
+ test_grep ! ^160000 entries
) &&
git init --object-format=sha1 sha1 &&
(
but we are getting into nits there.
Is it worth checking the stderr of the failing "git add submodule" call?
Adding a repo directly via "git add" is already something we generate a
warning for, and it's possible we might eventually make it an error. In
which case the command would fail without even hitting your new code,
but we'd have no idea. Adding in a test_grep for "cannot add a submodule
of a different hash algorithm" would at least make sure we're hitting
the error we expect.
-Peff
next prev parent reply other threads:[~2025-11-13 3:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 12:58 git fails to checkout SHA1 submodule in SHA256 repo with --depth=1 Martin Wilck
2025-11-12 16:32 ` Junio C Hamano
2025-11-12 23:37 ` brian m. carlson
2025-11-13 10:15 ` Martin Wilck
2025-11-13 22:51 ` brian m. carlson
2025-11-13 22:57 ` Martin Wilck
2025-11-14 22:55 ` Marc Branchaud
2025-11-15 20:14 ` brian m. carlson
2025-11-12 23:54 ` [PATCH] object-file: disallow adding submodules of different hash algo brian m. carlson
2025-11-13 3:26 ` Jeff King [this message]
2025-11-13 3:56 ` Jeff King
2025-11-13 16:29 ` Junio C Hamano
2025-11-14 23:26 ` brian m. carlson
2025-11-15 1:53 ` Jeff King
2025-11-13 23:15 ` brian m. carlson
2025-11-15 0:58 ` [PATCH v2 1/2] " brian m. carlson
2025-11-15 0:58 ` [PATCH v2 2/2] read-cache: drop submodule check from add_to_cache() brian m. carlson
2025-11-15 19:57 ` Junio C Hamano
2025-11-15 20:06 ` brian m. carlson
2025-11-15 19:53 ` [PATCH v2 1/2] object-file: disallow adding submodules of different hash algo Junio C Hamano
2025-11-17 8:53 ` Martin Wilck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251113032619.GA1739649@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=adrian@suse$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=mwilck@suse$(echo .)com \
--cc=sandals@crustytoothpaste$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox