On 2025-11-13 at 03:26:19, Jeff King wrote:
> This makes sense. I had meant to follow up on our conversation and patch
> from last month, but it was still on my todo list. Fortunately that
> earlier attempt gives me something concrete to compare to. ;)

That's why I CC'd you.  I hadn't seen your patch and wasn't sure if I'd
missed it or not.  I wanted to avoid us doing duplicate work.

> OK, you're checking for it here in index_path(), whereas my earlier
> attempt did it in add_to_index(). For the most part, I think your spot
> makes more sense, as it is at a lower level. add_to_index() eventually
> calls into index_path(), and so do some other code paths.

Yeah, I was hoping to catch any code path someone might use that updated
the index by path and this seemed like the most likely candidate.  I
think we'd be okay for `git update-index` or other things that update by
hash because they always interpret the object ID as the main algorithm.

> That does leave two interesting oddities:
> 
>   1. In add_to_index(), we have this code:
> 
>           if (S_ISDIR(st_mode)) {
>                   if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0)
>                           return error(_("'%s' does not have a commit checked out"), path);
>                   while (namelen && path[namelen-1] == '/')
>                           namelen--;
>           }
> 
>      which is run before we hit index_path(). So it may get an oid
>      result with an unexpected hash. I think that's OK, because nobody
>      ever looks at it (which would be a lot more obvious if we declared
>      the variable inside the conditional block here).
> 
>      This whole lookup does feel a little funny and redundant. It comes
>      from f937bc2f86 (add: error appropriately on repository with no
>      commits, 2019-04-09), and the main goal is making the error message
>      better. But should we just improve the error message from
>      index_path() for this case (in which case the resolve call above go
>      away)?
> 
>      I think this is mostly orthogonal to your patch and we can ignore
>      it for now. I only bring it up because now it's weird that we are
>      trying to catch the hash mismatch, but have this unchecked extra
>      resolve.

Yeah, I think we can.  I may have touched this case in my interop
series, but I think it's just to add another parameter to the function.
I don't recall anything super interesting about this in terms of hash
interoperability here.

>   2. There are paths in add_to_index() that _don't_ hit index_path(). In
>      particular, intent-to-add entries. So with your patch, even though
>      a regular "git add" is forbidden:
> 
>        $ git add repo
>        error: cannot add a submodule of a different hash algorithm
>        error: unable to index file 'repo'
>        fatal: updating files failed
> 
>      I can still do this:
> 
>        $ git add -N repo
>        warning: adding embedded git repository: repo
>        $ git ls-files -s
>        160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0	repo
> 
>      which skips the hash check entirely. Which kind of makes sense,
>      because the resulting index entry does not have a real oid in it at
>      all (it gets the empty blob oid). But it does have a real 160000
>      mode.

Right.  I at first tried to add the check there and the I realized it
was actually storing the empty blob OID and thus the hash algorithm
would never mismatch.

>      Can we make things worse from there? If we try to update it, for
>      example, that will fail:
> 
>        $ git add -u
>        error: cannot add a submodule of a different hash algorithm
>        error: unable to index file 'repo'
>        fatal: updating files failed
> 
>      So...maybe this is OK?

I think it is, or at least it's the best we can do.  If you do a commit
with the empty blob OID as your submodule, you already have a corrupt
repository, so catching it later on when you do `git add` or `git
update-index` should be okay.

> Makes sense. Purists might complain about "git ls-files" on the left
> hand side of a pipe, but I think it is OK here. Though you can golf away
> a few subprocesses at the same time with:

I stole that line from elsewhere in t3700 and modified it.  I agree that
it is maybe not the ideal style for our codebase, but I did appreciate
the elegance of doing the operation in a single line.

> Is it worth checking the stderr of the failing "git add submodule" call?
> Adding a repo directly via "git add" is already something we generate a
> warning for, and it's possible we might eventually make it an error. In
> which case the command would fail without even hitting your new code,
> but we'd have no idea. Adding in a test_grep for "cannot add a submodule
> of a different hash algorithm" would at least make sure we're hitting
> the error we expect.

I can do a v2 with those changes.  I will say that I also added changes
for `git submodule add`, which would catch the problem you mentioned,
because I thought that was the code most likely to have a different
implementation at some point in the future and I wanted to catch that
and also anybody doing the sensible thing in terms of adding submodules.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA