On 2025-11-13 at 03:26:19, Jeff King wrote: > This makes sense. I had meant to follow up on our conversation and patch > from last month, but it was still on my todo list. Fortunately that > earlier attempt gives me something concrete to compare to. ;) That's why I CC'd you. I hadn't seen your patch and wasn't sure if I'd missed it or not. I wanted to avoid us doing duplicate work. > OK, you're checking for it here in index_path(), whereas my earlier > attempt did it in add_to_index(). For the most part, I think your spot > makes more sense, as it is at a lower level. add_to_index() eventually > calls into index_path(), and so do some other code paths. Yeah, I was hoping to catch any code path someone might use that updated the index by path and this seemed like the most likely candidate. I think we'd be okay for `git update-index` or other things that update by hash because they always interpret the object ID as the main algorithm. > That does leave two interesting oddities: > > 1. In add_to_index(), we have this code: > > if (S_ISDIR(st_mode)) { > if (repo_resolve_gitlink_ref(the_repository, path, "HEAD", &oid) < 0) > return error(_("'%s' does not have a commit checked out"), path); > while (namelen && path[namelen-1] == '/') > namelen--; > } > > which is run before we hit index_path(). So it may get an oid > result with an unexpected hash. I think that's OK, because nobody > ever looks at it (which would be a lot more obvious if we declared > the variable inside the conditional block here). > > This whole lookup does feel a little funny and redundant. It comes > from f937bc2f86 (add: error appropriately on repository with no > commits, 2019-04-09), and the main goal is making the error message > better. But should we just improve the error message from > index_path() for this case (in which case the resolve call above go > away)? > > I think this is mostly orthogonal to your patch and we can ignore > it for now. I only bring it up because now it's weird that we are > trying to catch the hash mismatch, but have this unchecked extra > resolve. Yeah, I think we can. I may have touched this case in my interop series, but I think it's just to add another parameter to the function. I don't recall anything super interesting about this in terms of hash interoperability here. > 2. There are paths in add_to_index() that _don't_ hit index_path(). In > particular, intent-to-add entries. So with your patch, even though > a regular "git add" is forbidden: > > $ git add repo > error: cannot add a submodule of a different hash algorithm > error: unable to index file 'repo' > fatal: updating files failed > > I can still do this: > > $ git add -N repo > warning: adding embedded git repository: repo > $ git ls-files -s > 160000 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 repo > > which skips the hash check entirely. Which kind of makes sense, > because the resulting index entry does not have a real oid in it at > all (it gets the empty blob oid). But it does have a real 160000 > mode. Right. I at first tried to add the check there and the I realized it was actually storing the empty blob OID and thus the hash algorithm would never mismatch. > Can we make things worse from there? If we try to update it, for > example, that will fail: > > $ git add -u > error: cannot add a submodule of a different hash algorithm > error: unable to index file 'repo' > fatal: updating files failed > > So...maybe this is OK? I think it is, or at least it's the best we can do. If you do a commit with the empty blob OID as your submodule, you already have a corrupt repository, so catching it later on when you do `git add` or `git update-index` should be okay. > Makes sense. Purists might complain about "git ls-files" on the left > hand side of a pipe, but I think it is OK here. Though you can golf away > a few subprocesses at the same time with: I stole that line from elsewhere in t3700 and modified it. I agree that it is maybe not the ideal style for our codebase, but I did appreciate the elegance of doing the operation in a single line. > Is it worth checking the stderr of the failing "git add submodule" call? > Adding a repo directly via "git add" is already something we generate a > warning for, and it's possible we might eventually make it an error. In > which case the command would fail without even hitting your new code, > but we'd have no idea. Adding in a test_grep for "cannot add a submodule > of a different hash algorithm" would at least make sure we're hitting > the error we expect. I can do a v2 with those changes. I will say that I also added changes for `git submodule add`, which would catch the problem you mentioned, because I thought that was the code most likely to have a different implementation at some point in the future and I wanted to catch that and also anybody doing the sensible thing in terms of adding submodules. -- brian m. carlson (they/them) Toronto, Ontario, CA