public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Benson Muite <benson_muite@emailplus•org>
To: Jeff King <peff@peff•net>, Simon Richter <Simon.Richter@hogyros•de>
Cc: Junio C Hamano <gitster@pobox•com>, git@vger•kernel.org
Subject: Re: Mirror repositories for submodules
Date: Fri, 05 Jun 2026 07:54:50 +0300	[thread overview]
Message-ID: <87mrx9r3hh.fsf@emailplus.org> (raw)
In-Reply-To: <20260604061605.GA3194609@coredump.intra.peff.net>

Jeff King <peff@peff•net> writes:

> On Thu, Jun 04, 2026 at 02:11:38PM +0900, Simon Richter wrote:
>
>> Cloning from our server will, depending on what upstream uses, either a
>> relative URL (which will go to our server, but we have little control over
>> what the name part of the repository base URL is going to be), or an
>> absolute URL that instructs clients to pull from another place, which
>> conflicts with our goal to have a self-contained archive.
>> 
>> The idea posited earlier, to have a "repository identity" that remains the
>> same across forks and clones, is somewhat appealing, but the best idea I can
>> come up with is generating some kind of repository UUID, and adding a
>> symlink -- not a great design because it pollutes outside the repo:
>> 
>>     $ mkdir myproject
>>     $ cd myproject
>>     $ git init
>>     $ ls -l ..
>>     lrwxrwxrwx 1 simon simon   9 Jun  4 14:05
>> 12345678-9abc-def0-1234-56789abcdef0.git -> myproject
>>     drwxrwxr-x 2 simon simon  40 Jun  4 14:04 myproject
>> 
>> On the other hand, this can be used to construct a stable relative submodule
>> URL.
>
> Here's a thought experiment. What if you put the UUID into a URL, like:
>
>   repoid://123456789.git
>
> Then your in-repo .gitconfig would point to that repo id and be
> consistent. Of course you need some way to tell Git how to retrieve
> repoid:// URLs. You could do so with a custom remote helper
> (git-remote-repoid), but presumably that helper is eventually going to
> end up going over one of the normal Git protocols.
>
> So we just need to tell Git how to resolve repo id URLs into concrete
> URLs. And indeed, we have url.*.insteadOf to do rewriting already. So
> for example, you can add a submodule but convert it into a uuid like
> this:
>
>   $ git submodule add https://github.com/git/git.git
>   $ git config -f .gitmodules submodule.git.url
>   https://github.com/git/git.git
>   $ git config -f .gitmodules submodule.git.url repoid://123456789.git
>   $ git commit -am 'add submodule with magic repoid'
>
> Now if somebody else comes along and clones it naively, the repo uuid is
> not useful to git by itself:
>
>   $ git clone --recurse-submodules repo
>   Submodule 'git' (repoid://123456789.git) registered for path 'git'
>   Cloning into '/home/peff/tmp/repo/git'...
>   fatal: transport 'repoid' not allowed
>   fatal: clone of 'repoid://123456789.git' into submodule path '/home/peff/tmp/repo/git' failed
>
> But imagine that "somehow" they have learned that 123456789.git can be
> found at some URL. You can do this:
>
>   git -c url.https://github.com/git/git.git.insteadOf=repoid://123456789.git \
>       clone --recurse-submodules repo.git
>
> which would clone from the original URL. Or you could even imagine that
> they have a cache of repositories named by uuid, and then:
>
>   git -c url.https://my/cache/.insteadOf=repoid:// ...
>
> would rewrite all repoid://'s automatically.
>
> The use of "-c" here is mostly for illustration. It is a per-command
> config, so when you later try to update the submodule, you'd run into
> the same problem. Probably you'd want to stuff your mapping into on-disk
> config (either ~/.gitconfig, or if you have a lot of them, perhaps some
> file included from there).
>
> It would be nice if you could use "git clone -c" (note "-c" as an option
> to "clone", not to "git") to set a permanent per-repo config variable.
> But sadly the URL rewriting happens in the submodule repository, not the
> parent. So it has to be a per-user setting.
>
>
> Now, all of that said, do we still need uuids at all? If the canonical
> submodule name is https://github.com/git/git.git, then anybody can just
> rewrite that locally in the same way using url.*.insteadOf config. And I
> think this is a pretty standard way of using submodules. E.g., you might
> rewrite https:// into ssh:// if you prefer that protocol. Or point to a
> local server if it's faster for you.
>
> Which makes me wonder if I am missing something about the original
> request that started this thread. But it sounds to me like it is just
> asking for the existing URL-rewriting feature.
>

The  problem is that one might have multiple repositories, submodules
may themselves have submodules.  Typically a primary development
organization will have its own host, but may also have mirrors on other
services which maybe more convenient for others to use.  A recursive
clone could give upto 20 repositories not all of which are maintained by
the same organization.  URL-rewriting each of them can be inefficient,
especially when the upstream maintains the mirror repositories and can
indicate that in the source repositories.


> -Peff

  parent reply	other threads:[~2026-06-05  4:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01  6:11 Mirror repositories for submodules Benson Muite
2026-06-04  1:09 ` Junio C Hamano
2026-06-04  5:11   ` Simon Richter
2026-06-04  6:16     ` Jeff King
2026-06-04  9:27       ` Simon Richter
2026-06-05  4:54       ` Benson Muite [this message]
2026-06-05  4:47     ` Benson Muite
2026-06-05  9:34       ` Matt Hunter
2026-06-05  5:05     ` Benson Muite
2026-06-05  4:37   ` Benson Muite
2026-06-05  4:57   ` Benson Muite

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mrx9r3hh.fsf@emailplus.org \
    --to=benson_muite@emailplus$(echo .)org \
    --cc=Simon.Richter@hogyros$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox