* Mirror repositories for submodules @ 2026-06-01 6:11 Benson Muite 2026-06-04 1:09 ` Junio C Hamano 0 siblings, 1 reply; 5+ messages in thread From: Benson Muite @ 2026-06-01 6:11 UTC (permalink / raw) To: git Hi, Would a contribution to add mirror repositories as alternate submodule sources be considered for inclusion? Some projects have mirror repositories on other hosting services, and may have bandwidth limits on their primary hosting service. Being able to indicate mirror repositories for where to check for updates and sources for submodules when doing `git clone --recurse-submodules https://my.repo ` or `git submodule update --init --recursive` would be helpful when there is a timeout. Regards, Benson ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Mirror repositories for submodules 2026-06-01 6:11 Mirror repositories for submodules Benson Muite @ 2026-06-04 1:09 ` Junio C Hamano 2026-06-04 5:11 ` Simon Richter 0 siblings, 1 reply; 5+ messages in thread From: Junio C Hamano @ 2026-06-04 1:09 UTC (permalink / raw) To: Benson Muite; +Cc: git Benson Muite <benson_muite@emailplus•org> writes: > Would a contribution to add mirror repositories as alternate submodule > sources be considered for inclusion? Some projects have mirror > repositories on other hosting services, and may have bandwidth limits on > their primary hosting service. Being able to indicate mirror > repositories for where to check for updates and sources for submodules > when doing `git clone --recurse-submodules https://my.repo ` or `git > submodule update --init --recursive` would be helpful when there is a > timeout. I do not see why such a "oh, the repository at $URL1 seems to be down, but we know $URL2 serves the equivalent information, so let's go there instead" feature has to be limited to submodule use case. So, no, I do not think a contribution to add mirror repositories as alternate submodule sources should be considered for inclusion, as it artificially limits usefulness of the feature. A feature to add mirror repositories as alternate sources might be worth considering, though. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Mirror repositories for submodules 2026-06-04 1:09 ` Junio C Hamano @ 2026-06-04 5:11 ` Simon Richter 2026-06-04 6:16 ` Jeff King 0 siblings, 1 reply; 5+ messages in thread From: Simon Richter @ 2026-06-04 5:11 UTC (permalink / raw) To: Junio C Hamano, Benson Muite; +Cc: git Hi, On 6/4/26 10:09 AM, Junio C Hamano wrote: > So, no, I do not think a contribution to add mirror repositories as > alternate submodule sources should be considered for inclusion, as > it artificially limits usefulness of the feature. A feature to add > mirror repositories as alternate sources might be worth considering, > though. This is relevant to the Debian use case: we run a git server that archives git trees for Debian packages, and ideally the objects on this server should be identical to what you get from upstream projects. This is a big problem for archiving projects that use submodules, because we cannot alter the reference URLs. Cloning from our server will, depending on what upstream uses, either a relative URL (which will go to our server, but we have little control over what the name part of the repository base URL is going to be), or an absolute URL that instructs clients to pull from another place, which conflicts with our goal to have a self-contained archive. The idea posited earlier, to have a "repository identity" that remains the same across forks and clones, is somewhat appealing, but the best idea I can come up with is generating some kind of repository UUID, and adding a symlink -- not a great design because it pollutes outside the repo: $ mkdir myproject $ cd myproject $ git init $ ls -l .. lrwxrwxrwx 1 simon simon 9 Jun 4 14:05 12345678-9abc-def0-1234-56789abcdef0.git -> myproject drwxrwxr-x 2 simon simon 40 Jun 4 14:04 myproject On the other hand, this can be used to construct a stable relative submodule URL. Making the symlinks optional would require keeping a list of local clones and their UUIDs, and resolving them. I don't like that design, but as I said it's the best idea I have for now. I also fully expect that Debian's servers will be used by a lot of people outside the project as soon as it becomes a convenient fallback, in the same way people are pulling .orig.tar.gz archives from Debian mirrors, so we need to make it easy to set up a mirror, to allow this to scale. Simon ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Mirror repositories for submodules 2026-06-04 5:11 ` Simon Richter @ 2026-06-04 6:16 ` Jeff King 2026-06-04 9:27 ` Simon Richter 0 siblings, 1 reply; 5+ messages in thread From: Jeff King @ 2026-06-04 6:16 UTC (permalink / raw) To: Simon Richter; +Cc: Junio C Hamano, Benson Muite, git On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote: > Cloning from our server will, depending on what upstream uses, either a > relative URL (which will go to our server, but we have little control over > what the name part of the repository base URL is going to be), or an > absolute URL that instructs clients to pull from another place, which > conflicts with our goal to have a self-contained archive. > > The idea posited earlier, to have a "repository identity" that remains the > same across forks and clones, is somewhat appealing, but the best idea I can > come up with is generating some kind of repository UUID, and adding a > symlink -- not a great design because it pollutes outside the repo: > > $ mkdir myproject > $ cd myproject > $ git init > $ ls -l .. > lrwxrwxrwx 1 simon simon 9 Jun 4 14:05 > 12345678-9abc-def0-1234-56789abcdef0.git -> myproject > drwxrwxr-x 2 simon simon 40 Jun 4 14:04 myproject > > On the other hand, this can be used to construct a stable relative submodule > URL. Here's a thought experiment. What if you put the UUID into a URL, like: repoid://123456789.git Then your in-repo .gitconfig would point to that repo id and be consistent. Of course you need some way to tell Git how to retrieve repoid:// URLs. You could do so with a custom remote helper (git-remote-repoid), but presumably that helper is eventually going to end up going over one of the normal Git protocols. So we just need to tell Git how to resolve repo id URLs into concrete URLs. And indeed, we have url.*.insteadOf to do rewriting already. So for example, you can add a submodule but convert it into a uuid like this: $ git submodule add https://github.com/git/git.git $ git config -f .gitmodules submodule.git.url https://github.com/git/git.git $ git config -f .gitmodules submodule.git.url repoid://123456789.git $ git commit -am 'add submodule with magic repoid' Now if somebody else comes along and clones it naively, the repo uuid is not useful to git by itself: $ git clone --recurse-submodules repo Submodule 'git' (repoid://123456789.git) registered for path 'git' Cloning into '/home/peff/tmp/repo/git'... fatal: transport 'repoid' not allowed fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed But imagine that "somehow" they have learned that 123456789.git can be found at some URL. You can do this: git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \ clone --recurse-submodules repo.git which would clone from the original URL. Or you could even imagine that they have a cache of repositories named by uuid, and then: git -c url.https://my/cache/.insteadOf=repoid:// ... would rewrite all repoid://'s automatically. The use of "-c" here is mostly for illustration. It is a per-command config, so when you later try to update the submodule, you'd run into the same problem. Probably you'd want to stuff your mapping into on-disk config (either ~/.gitconfig, or if you have a lot of them, perhaps some file included from there). It would be nice if you could use "git clone -c" (note "-c" as an option to "clone", not to "git") to set a permanent per-repo config variable. But sadly the URL rewriting happens in the submodule repository, not the parent. So it has to be a per-user setting. Now, all of that said, do we still need uuids at all? If the canonical submodule name is https://github.com/git/git.git, then anybody can just rewrite that locally in the same way using url.*.insteadOf config. And I think this is a pretty standard way of using submodules. E.g., you might rewrite https:// into ssh:// if you prefer that protocol. Or point to a local server if it's faster for you. Which makes me wonder if I am missing something about the original request that started this thread. But it sounds to me like it is just asking for the existing URL-rewriting feature. -Peff ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Mirror repositories for submodules 2026-06-04 6:16 ` Jeff King @ 2026-06-04 9:27 ` Simon Richter 0 siblings, 0 replies; 5+ messages in thread From: Simon Richter @ 2026-06-04 9:27 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Benson Muite, git Hi, On 6/4/26 3:16 PM, Jeff King wrote: > Here's a thought experiment. What if you put the UUID into a URL, like: > repoid://123456789.git Yes, that's the idea, except I would want to use a relative URL, like ../123456789.git This could solve the "naive cloning" problem, because it creates an expectation that the submodules can be found on the same server, or in a nearby path. I'm aware that this is *also* bad for decentralization, because it makes it easier to use one of the big forges where the repositories for often-used submodules are are already likely to be present, but it plays into our use case, where we want to share the repositories for often-used subprojects. > Now, all of that said, do we still need uuids at all? If the canonical > submodule name is https://github.com/git/git.git, then anybody can just > rewrite that locally in the same way using url.*.insteadOf config. Yes, but we'd then need a mechanism for a server to indicate "for cloning, you should use these 'insteadOf' settings, which is a massive can of worms from a security standpoint. I also don't think these canonical URLs can ever be stable if they refer to infrastructure that is not under the control of the maintainer -- it would tie the project identity to the hosting provider, and increase the inertia to overcome for moves (such as the current exodus from github and gitlab towards codeberg). > Which makes me wonder if I am missing something about the original > request that started this thread. But it sounds to me like it is just > asking for the existing URL-rewriting feature. The original mail has a similar problem as we do in Debian, and as my employer has: CI jobs should exclusively talk to in-house infrastructure, because continuously cloning repositories for each build is bad for the environment. The common goal is that a naive clone should get submodules from a local server, ideally without us having to write some tool to make an initial checkout, enumerate submodules, create insteadOf settings, clone first layer of submodules, enumerate second layer, ... Simon ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-04 9:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-01 6:11 Mirror repositories for submodules Benson Muite 2026-06-04 1:09 ` Junio C Hamano 2026-06-04 5:11 ` Simon Richter 2026-06-04 6:16 ` Jeff King 2026-06-04 9:27 ` Simon Richter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox