public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
* Mirror repositories for submodules
@ 2026-06-01  6:11 Benson Muite
  2026-06-04  1:09 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Benson Muite @ 2026-06-01  6:11 UTC (permalink / raw)
  To: git

Hi,

Would a contribution to add mirror repositories as alternate submodule
sources be considered for inclusion?  Some projects have mirror
repositories on other hosting services, and may have bandwidth limits on
their primary hosting service.  Being able to indicate mirror
repositories for where to check for updates and sources for submodules
when doing `git clone --recurse-submodules https://my.repo ` or `git
submodule update --init --recursive` would be helpful when there is a
timeout.

Regards,
Benson

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mirror repositories for submodules
  2026-06-01  6:11 Mirror repositories for submodules Benson Muite
@ 2026-06-04  1:09 ` Junio C Hamano
  2026-06-04  5:11   ` Simon Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2026-06-04  1:09 UTC (permalink / raw)
  To: Benson Muite; +Cc: git

Benson Muite <benson_muite@emailplus•org> writes:

> Would a contribution to add mirror repositories as alternate submodule
> sources be considered for inclusion?  Some projects have mirror
> repositories on other hosting services, and may have bandwidth limits on
> their primary hosting service.  Being able to indicate mirror
> repositories for where to check for updates and sources for submodules
> when doing `git clone --recurse-submodules https://my.repo ` or `git
> submodule update --init --recursive` would be helpful when there is a
> timeout.

I do not see why such a "oh, the repository at $URL1 seems to be
down, but we know $URL2 serves the equivalent information, so let's
go there instead" feature has to be limited to submodule use case.

So, no, I do not think a contribution to add mirror repositories as
alternate submodule sources should be considered for inclusion, as
it artificially limits usefulness of the feature.  A feature to add
mirror repositories as alternate sources might be worth considering,
though.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mirror repositories for submodules
  2026-06-04  1:09 ` Junio C Hamano
@ 2026-06-04  5:11   ` Simon Richter
  2026-06-04  6:16     ` Jeff King
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Richter @ 2026-06-04  5:11 UTC (permalink / raw)
  To: Junio C Hamano, Benson Muite; +Cc: git

Hi,

On 6/4/26 10:09 AM, Junio C Hamano wrote:

> So, no, I do not think a contribution to add mirror repositories as
> alternate submodule sources should be considered for inclusion, as
> it artificially limits usefulness of the feature.  A feature to add
> mirror repositories as alternate sources might be worth considering,
> though.

This is relevant to the Debian use case: we run a git server that 
archives git trees for Debian packages, and ideally the objects on this 
server should be identical to what you get from upstream projects.

This is a big problem for archiving projects that use submodules, 
because we cannot alter the reference URLs.

Cloning from our server will, depending on what upstream uses, either a 
relative URL (which will go to our server, but we have little control 
over what the name part of the repository base URL is going to be), or 
an absolute URL that instructs clients to pull from another place, which 
conflicts with our goal to have a self-contained archive.

The idea posited earlier, to have a "repository identity" that remains 
the same across forks and clones, is somewhat appealing, but the best 
idea I can come up with is generating some kind of repository UUID, and 
adding a symlink -- not a great design because it pollutes outside the repo:

     $ mkdir myproject
     $ cd myproject
     $ git init
     $ ls -l ..
     lrwxrwxrwx 1 simon simon   9 Jun  4 14:05 
12345678-9abc-def0-1234-56789abcdef0.git -> myproject
     drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject

On the other hand, this can be used to construct a stable relative 
submodule URL.

Making the symlinks optional would require keeping a list of local 
clones and their UUIDs, and resolving them.

I don't like that design, but as I said it's the best idea I have for now.

I also fully expect that Debian's servers will be used by a lot of 
people outside the project as soon as it becomes a convenient fallback, 
in the same way people are pulling .orig.tar.gz archives from Debian 
mirrors, so we need to make it easy to set up a mirror, to allow this to 
scale.

    Simon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mirror repositories for submodules
  2026-06-04  5:11   ` Simon Richter
@ 2026-06-04  6:16     ` Jeff King
  2026-06-04  9:27       ` Simon Richter
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2026-06-04  6:16 UTC (permalink / raw)
  To: Simon Richter; +Cc: Junio C Hamano, Benson Muite, git

On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:

> Cloning from our server will, depending on what upstream uses, either a
> relative URL (which will go to our server, but we have little control over
> what the name part of the repository base URL is going to be), or an
> absolute URL that instructs clients to pull from another place, which
> conflicts with our goal to have a self-contained archive.
> 
> The idea posited earlier, to have a "repository identity" that remains the
> same across forks and clones, is somewhat appealing, but the best idea I can
> come up with is generating some kind of repository UUID, and adding a
> symlink -- not a great design because it pollutes outside the repo:
> 
>     $ mkdir myproject
>     $ cd myproject
>     $ git init
>     $ ls -l ..
>     lrwxrwxrwx 1 simon simon   9 Jun  4 14:05
> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>     drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
> 
> On the other hand, this can be used to construct a stable relative submodule
> URL.

Here's a thought experiment. What if you put the UUID into a URL, like:

  repoid://123456789.git

Then your in-repo .gitconfig would point to that repo id and be
consistent. Of course you need some way to tell Git how to retrieve
repoid:// URLs. You could do so with a custom remote helper
(git-remote-repoid), but presumably that helper is eventually going to
end up going over one of the normal Git protocols.

So we just need to tell Git how to resolve repo id URLs into concrete
URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
for example, you can add a submodule but convert it into a uuid like
this:

  $ git submodule add https://github.com/git/git.git
  $ git config -f .gitmodules submodule.git.url
  https://github.com/git/git.git
  $ git config -f .gitmodules submodule.git.url repoid://123456789.git
  $ git commit -am 'add submodule with magic repoid'

Now if somebody else comes along and clones it naively, the repo uuid is
not useful to git by itself:

  $ git clone --recurse-submodules repo
  Submodule 'git' (repoid://123456789.git) registered for path 'git'
  Cloning into '/home/peff/tmp/repo/git'...
  fatal: transport 'repoid' not allowed
  fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed

But imagine that "somehow" they have learned that 123456789.git can be
found at some URL. You can do this:

  git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
      clone --recurse-submodules repo.git

which would clone from the original URL. Or you could even imagine that
they have a cache of repositories named by uuid, and then:

  git -c url.https://my/cache/.insteadOf=repoid:// ...

would rewrite all repoid://'s automatically.

The use of "-c" here is mostly for illustration. It is a per-command
config, so when you later try to update the submodule, you'd run into
the same problem. Probably you'd want to stuff your mapping into on-disk
config (either ~/.gitconfig, or if you have a lot of them, perhaps some
file included from there).

It would be nice if you could use "git clone -c" (note "-c" as an option
to "clone", not to "git") to set a permanent per-repo config variable.
But sadly the URL rewriting happens in the submodule repository, not the
parent. So it has to be a per-user setting.


Now, all of that said, do we still need uuids at all? If the canonical
submodule name is https://github.com/git/git.git, then anybody can just
rewrite that locally in the same way using url.*.insteadOf config. And I
think this is a pretty standard way of using submodules. E.g., you might
rewrite https:// into ssh:// if you prefer that protocol. Or point to a
local server if it's faster for you.

Which makes me wonder if I am missing something about the original
request that started this thread. But it sounds to me like it is just
asking for the existing URL-rewriting feature.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Mirror repositories for submodules
  2026-06-04  6:16     ` Jeff King
@ 2026-06-04  9:27       ` Simon Richter
  0 siblings, 0 replies; 5+ messages in thread
From: Simon Richter @ 2026-06-04  9:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Benson Muite, git

Hi,

On 6/4/26 3:16 PM, Jeff King wrote:

> Here's a thought experiment. What if you put the UUID into a URL, like:
>    repoid://123456789.git

Yes, that's the idea, except I would want to use a relative URL, like

     ../123456789.git

This could solve the "naive cloning" problem, because it creates an 
expectation that the submodules can be found on the same server, or in a 
nearby path.

I'm aware that this is *also* bad for decentralization, because it makes 
it easier to use one of the big forges where the repositories for 
often-used submodules are are already likely to be present, but it plays 
into our use case, where we want to share the repositories for 
often-used subprojects.

> Now, all of that said, do we still need uuids at all? If the canonical
> submodule name is https://github.com/git/git.git, then anybody can just
> rewrite that locally in the same way using url.*.insteadOf config.

Yes, but we'd then need a mechanism for a server to indicate "for 
cloning, you should use these 'insteadOf' settings, which is a massive 
can of worms from a security standpoint.

I also don't think these canonical URLs can ever be stable if they refer 
to infrastructure that is not under the control of the maintainer -- it 
would tie the project identity to the hosting provider, and increase the 
inertia to overcome for moves (such as the current exodus from github 
and gitlab towards codeberg).

> Which makes me wonder if I am missing something about the original
> request that started this thread. But it sounds to me like it is just
> asking for the existing URL-rewriting feature.

The original mail has a similar problem as we do in Debian, and as my 
employer has: CI jobs should exclusively talk to in-house 
infrastructure, because continuously cloning repositories for each build 
is bad for the environment.

The common goal is that a naive clone should get submodules from a local 
server, ideally without us having to write some tool to make an initial 
checkout, enumerate submodules, create insteadOf settings, clone first 
layer of submodules, enumerate second layer, ...

    Simon

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-04  9:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01  6:11 Mirror repositories for submodules Benson Muite
2026-06-04  1:09 ` Junio C Hamano
2026-06-04  5:11   ` Simon Richter
2026-06-04  6:16     ` Jeff King
2026-06-04  9:27       ` Simon Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox