* Re: Mirror repositories for submodules
2026-06-04 5:11 ` Simon Richter
@ 2026-06-04 6:16 ` Jeff King
2026-06-04 9:27 ` Simon Richter
2026-06-05 4:54 ` Benson Muite
2026-06-05 4:47 ` Benson Muite
2026-06-05 5:05 ` Benson Muite
2 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2026-06-04 6:16 UTC (permalink / raw)
To: Simon Richter; +Cc: Junio C Hamano, Benson Muite, git
On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:
> Cloning from our server will, depending on what upstream uses, either a
> relative URL (which will go to our server, but we have little control over
> what the name part of the repository base URL is going to be), or an
> absolute URL that instructs clients to pull from another place, which
> conflicts with our goal to have a self-contained archive.
>
> The idea posited earlier, to have a "repository identity" that remains the
> same across forks and clones, is somewhat appealing, but the best idea I can
> come up with is generating some kind of repository UUID, and adding a
> symlink -- not a great design because it pollutes outside the repo:
>
> $ mkdir myproject
> $ cd myproject
> $ git init
> $ ls -l ..
> lrwxrwxrwx 1 simon simon 9 Jun 4 14:05
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
> drwxrwxr-x 2 simon simon 40 Jun 4 14:04 myproject
>
> On the other hand, this can be used to construct a stable relative submodule
> URL.
Here's a thought experiment. What if you put the UUID into a URL, like:
repoid://123456789.git
Then your in-repo .gitconfig would point to that repo id and be
consistent. Of course you need some way to tell Git how to retrieve
repoid:// URLs. You could do so with a custom remote helper
(git-remote-repoid), but presumably that helper is eventually going to
end up going over one of the normal Git protocols.
So we just need to tell Git how to resolve repo id URLs into concrete
URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
for example, you can add a submodule but convert it into a uuid like
this:
$ git submodule add https://github.com/git/git.git
$ git config -f .gitmodules submodule.git.url
https://github.com/git/git.git
$ git config -f .gitmodules submodule.git.url repoid://123456789.git
$ git commit -am 'add submodule with magic repoid'
Now if somebody else comes along and clones it naively, the repo uuid is
not useful to git by itself:
$ git clone --recurse-submodules repo
Submodule 'git' (repoid://123456789.git) registered for path 'git'
Cloning into '/home/peff/tmp/repo/git'...
fatal: transport 'repoid' not allowed
fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed
But imagine that "somehow" they have learned that 123456789.git can be
found at some URL. You can do this:
git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
clone --recurse-submodules repo.git
which would clone from the original URL. Or you could even imagine that
they have a cache of repositories named by uuid, and then:
git -c url.https://my/cache/.insteadOf=repoid:// ...
would rewrite all repoid://'s automatically.
The use of "-c" here is mostly for illustration. It is a per-command
config, so when you later try to update the submodule, you'd run into
the same problem. Probably you'd want to stuff your mapping into on-disk
config (either ~/.gitconfig, or if you have a lot of them, perhaps some
file included from there).
It would be nice if you could use "git clone -c" (note "-c" as an option
to "clone", not to "git") to set a permanent per-repo config variable.
But sadly the URL rewriting happens in the submodule repository, not the
parent. So it has to be a per-user setting.
Now, all of that said, do we still need uuids at all? If the canonical
submodule name is https://github.com/git/git.git, then anybody can just
rewrite that locally in the same way using url.*.insteadOf config. And I
think this is a pretty standard way of using submodules. E.g., you might
rewrite https:// into ssh:// if you prefer that protocol. Or point to a
local server if it's faster for you.
Which makes me wonder if I am missing something about the original
request that started this thread. But it sounds to me like it is just
asking for the existing URL-rewriting feature.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Mirror repositories for submodules
2026-06-04 6:16 ` Jeff King
@ 2026-06-04 9:27 ` Simon Richter
2026-06-05 4:54 ` Benson Muite
1 sibling, 0 replies; 10+ messages in thread
From: Simon Richter @ 2026-06-04 9:27 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, Benson Muite, git
Hi,
On 6/4/26 3:16 PM, Jeff King wrote:
> Here's a thought experiment. What if you put the UUID into a URL, like:
> repoid://123456789.git
Yes, that's the idea, except I would want to use a relative URL, like
../123456789.git
This could solve the "naive cloning" problem, because it creates an
expectation that the submodules can be found on the same server, or in a
nearby path.
I'm aware that this is *also* bad for decentralization, because it makes
it easier to use one of the big forges where the repositories for
often-used submodules are are already likely to be present, but it plays
into our use case, where we want to share the repositories for
often-used subprojects.
> Now, all of that said, do we still need uuids at all? If the canonical
> submodule name is https://github.com/git/git.git, then anybody can just
> rewrite that locally in the same way using url.*.insteadOf config.
Yes, but we'd then need a mechanism for a server to indicate "for
cloning, you should use these 'insteadOf' settings, which is a massive
can of worms from a security standpoint.
I also don't think these canonical URLs can ever be stable if they refer
to infrastructure that is not under the control of the maintainer -- it
would tie the project identity to the hosting provider, and increase the
inertia to overcome for moves (such as the current exodus from github
and gitlab towards codeberg).
> Which makes me wonder if I am missing something about the original
> request that started this thread. But it sounds to me like it is just
> asking for the existing URL-rewriting feature.
The original mail has a similar problem as we do in Debian, and as my
employer has: CI jobs should exclusively talk to in-house
infrastructure, because continuously cloning repositories for each build
is bad for the environment.
The common goal is that a naive clone should get submodules from a local
server, ideally without us having to write some tool to make an initial
checkout, enumerate submodules, create insteadOf settings, clone first
layer of submodules, enumerate second layer, ...
Simon
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Mirror repositories for submodules
2026-06-04 6:16 ` Jeff King
2026-06-04 9:27 ` Simon Richter
@ 2026-06-05 4:54 ` Benson Muite
1 sibling, 0 replies; 10+ messages in thread
From: Benson Muite @ 2026-06-05 4:54 UTC (permalink / raw)
To: Jeff King, Simon Richter; +Cc: Junio C Hamano, git
Jeff King <peff@peff•net> writes:
> On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:
>
>> Cloning from our server will, depending on what upstream uses, either a
>> relative URL (which will go to our server, but we have little control over
>> what the name part of the repository base URL is going to be), or an
>> absolute URL that instructs clients to pull from another place, which
>> conflicts with our goal to have a self-contained archive.
>>
>> The idea posited earlier, to have a "repository identity" that remains the
>> same across forks and clones, is somewhat appealing, but the best idea I can
>> come up with is generating some kind of repository UUID, and adding a
>> symlink -- not a great design because it pollutes outside the repo:
>>
>> $ mkdir myproject
>> $ cd myproject
>> $ git init
>> $ ls -l ..
>> lrwxrwxrwx 1 simon simon 9 Jun 4 14:05
>> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>> drwxrwxr-x 2 simon simon 40 Jun 4 14:04 myproject
>>
>> On the other hand, this can be used to construct a stable relative submodule
>> URL.
>
> Here's a thought experiment. What if you put the UUID into a URL, like:
>
> repoid://123456789.git
>
> Then your in-repo .gitconfig would point to that repo id and be
> consistent. Of course you need some way to tell Git how to retrieve
> repoid:// URLs. You could do so with a custom remote helper
> (git-remote-repoid), but presumably that helper is eventually going to
> end up going over one of the normal Git protocols.
>
> So we just need to tell Git how to resolve repo id URLs into concrete
> URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
> for example, you can add a submodule but convert it into a uuid like
> this:
>
> $ git submodule add https://github.com/git/git.git
> $ git config -f .gitmodules submodule.git.url
> https://github.com/git/git.git
> $ git config -f .gitmodules submodule.git.url repoid://123456789.git
> $ git commit -am 'add submodule with magic repoid'
>
> Now if somebody else comes along and clones it naively, the repo uuid is
> not useful to git by itself:
>
> $ git clone --recurse-submodules repo
> Submodule 'git' (repoid://123456789.git) registered for path 'git'
> Cloning into '/home/peff/tmp/repo/git'...
> fatal: transport 'repoid' not allowed
> fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed
>
> But imagine that "somehow" they have learned that 123456789.git can be
> found at some URL. You can do this:
>
> git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
> clone --recurse-submodules repo.git
>
> which would clone from the original URL. Or you could even imagine that
> they have a cache of repositories named by uuid, and then:
>
> git -c url.https://my/cache/.insteadOf=repoid:// ...
>
> would rewrite all repoid://'s automatically.
>
> The use of "-c" here is mostly for illustration. It is a per-command
> config, so when you later try to update the submodule, you'd run into
> the same problem. Probably you'd want to stuff your mapping into on-disk
> config (either ~/.gitconfig, or if you have a lot of them, perhaps some
> file included from there).
>
> It would be nice if you could use "git clone -c" (note "-c" as an option
> to "clone", not to "git") to set a permanent per-repo config variable.
> But sadly the URL rewriting happens in the submodule repository, not the
> parent. So it has to be a per-user setting.
>
>
> Now, all of that said, do we still need uuids at all? If the canonical
> submodule name is https://github.com/git/git.git, then anybody can just
> rewrite that locally in the same way using url.*.insteadOf config. And I
> think this is a pretty standard way of using submodules. E.g., you might
> rewrite https:// into ssh:// if you prefer that protocol. Or point to a
> local server if it's faster for you.
>
> Which makes me wonder if I am missing something about the original
> request that started this thread. But it sounds to me like it is just
> asking for the existing URL-rewriting feature.
>
The problem is that one might have multiple repositories, submodules
may themselves have submodules. Typically a primary development
organization will have its own host, but may also have mirrors on other
services which maybe more convenient for others to use. A recursive
clone could give upto 20 repositories not all of which are maintained by
the same organization. URL-rewriting each of them can be inefficient,
especially when the upstream maintains the mirror repositories and can
indicate that in the source repositories.
> -Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Mirror repositories for submodules
2026-06-04 5:11 ` Simon Richter
2026-06-04 6:16 ` Jeff King
@ 2026-06-05 4:47 ` Benson Muite
2026-06-05 5:05 ` Benson Muite
2 siblings, 0 replies; 10+ messages in thread
From: Benson Muite @ 2026-06-05 4:47 UTC (permalink / raw)
To: Simon Richter, Junio C Hamano; +Cc: git
Simon Richter <Simon.Richter@hogyros•de> writes:
> Hi,
>
> On 6/4/26 10:09 AM, Junio C Hamano wrote:
>
>> So, no, I do not think a contribution to add mirror repositories as
>> alternate submodule sources should be considered for inclusion, as
>> it artificially limits usefulness of the feature. A feature to add
>> mirror repositories as alternate sources might be worth considering,
>> though.
>
> This is relevant to the Debian use case: we run a git server that
> archives git trees for Debian packages, and ideally the objects on this
> server should be identical to what you get from upstream projects.
>
> This is a big problem for archiving projects that use submodules,
> because we cannot alter the reference URLs.
>
> Cloning from our server will, depending on what upstream uses, either a
> relative URL (which will go to our server, but we have little control
> over what the name part of the repository base URL is going to be), or
> an absolute URL that instructs clients to pull from another place, which
> conflicts with our goal to have a self-contained archive.
>
> The idea posited earlier, to have a "repository identity" that remains
> the same across forks and clones, is somewhat appealing, but the best
> idea I can come up with is generating some kind of repository UUID, and
> adding a symlink -- not a great design because it pollutes outside the repo:
>
> $ mkdir myproject
> $ cd myproject
> $ git init
> $ ls -l ..
> lrwxrwxrwx 1 simon simon 9 Jun 4 14:05
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
> drwxrwxr-x 2 simon simon 40 Jun 4 14:04 myproject
>
> On the other hand, this can be used to construct a stable relative
> submodule URL.
>
> Making the symlinks optional would require keeping a list of local
> clones and their UUIDs, and resolving them.
>
> I don't like that design, but as I said it's the best idea I have for now.
>
> I also fully expect that Debian's servers will be used by a lot of
> people outside the project as soon as it becomes a convenient fallback,
> in the same way people are pulling .orig.tar.gz archives from Debian
> mirrors, so we need to make it easy to set up a mirror, to allow this to
> scale.
>
For submodules, the metadata consists of the url of the repository to
clone from. One could have a list of absolute URLs. The default would
be to assume that the URLs are tried in order, and if a URL times out,
the next one would be tried. One may want to change the default
ordering as a user setting, or do a ping test to get obtain content from
the closest repository.
As an example, for linphone-desktop, the first part of the .gitmodules
file contains:
[submodule "linphone-sdk"]
path = external/linphone-sdk
url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
[submodule "external/google/gn"]
This could be updated to
[submodule "linphone-sdk"]
path = external/linphone-sdk
url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
url = https://github.com/BelledonneCommunications/linphone-sdk.git
[submodule "external/google/gn"]
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Mirror repositories for submodules
2026-06-04 5:11 ` Simon Richter
2026-06-04 6:16 ` Jeff King
2026-06-05 4:47 ` Benson Muite
@ 2026-06-05 5:05 ` Benson Muite
2 siblings, 0 replies; 10+ messages in thread
From: Benson Muite @ 2026-06-05 5:05 UTC (permalink / raw)
To: Simon Richter, Junio C Hamano; +Cc: git
Simon Richter <Simon.Richter@hogyros•de> writes:
> Hi,
>
> On 6/4/26 10:09 AM, Junio C Hamano wrote:
>
>> So, no, I do not think a contribution to add mirror repositories as
>> alternate submodule sources should be considered for inclusion, as
>> it artificially limits usefulness of the feature. A feature to add
>> mirror repositories as alternate sources might be worth considering,
>> though.
>
> This is relevant to the Debian use case: we run a git server that
> archives git trees for Debian packages, and ideally the objects on this
> server should be identical to what you get from upstream projects.
>
> This is a big problem for archiving projects that use submodules,
> because we cannot alter the reference URLs.
>
> Cloning from our server will, depending on what upstream uses, either a
> relative URL (which will go to our server, but we have little control
> over what the name part of the repository base URL is going to be), or
> an absolute URL that instructs clients to pull from another place, which
> conflicts with our goal to have a self-contained archive.
>
> The idea posited earlier, to have a "repository identity" that remains
> the same across forks and clones, is somewhat appealing, but the best
> idea I can come up with is generating some kind of repository UUID, and
> adding a symlink -- not a great design because it pollutes outside the repo:
>
> $ mkdir myproject
> $ cd myproject
> $ git init
> $ ls -l ..
> lrwxrwxrwx 1 simon simon 9 Jun 4 14:05
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
> drwxrwxr-x 2 simon simon 40 Jun 4 14:04 myproject
>
> On the other hand, this can be used to construct a stable relative
> submodule URL.
>
> Making the symlinks optional would require keeping a list of local
> clones and their UUIDs, and resolving them.
>
> I don't like that design, but as I said it's the best idea I have for now.
>
For submodules, the metadata consists of the url of the repository to
clone from. One could have a list of absolute URLs. The default would
be to assume that the URLs are tried in order, and if a URL times out,
the next one would be tried. One may want to change the default
ordering as a user setting, or do a ping test to get obtain content from
the closest repository.
As an example, for linphone-desktop, the first part of the .gitmodules
file contains:
[submodule "linphone-sdk"]
path = external/linphone-sdk
url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
[submodule "external/google/gn"]
This could be updated to
[submodule "linphone-sdk"]
path = external/linphone-sdk
url = https://gitlab.linphone.org/BC/public/linphone-sdk.git
url = https://github.com/BelledonneCommunications/linphone-sdk.git
[submodule "external/google/gn"]
> I also fully expect that Debian's servers will be used by a lot of
> people outside the project as soon as it becomes a convenient fallback,
> in the same way people are pulling .orig.tar.gz archives from Debian
> mirrors, so we need to make it easy to set up a mirror, to allow this to
> scale.
>
> Simon
^ permalink raw reply [flat|nested] 10+ messages in thread