public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Lukas Fleischer <lfleischer@lfos•de>
To: Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH] Allow hideRefs to match refs outside the namespace
Date: Sun, 01 Nov 2015 00:40:39 +0100	[thread overview]
Message-ID: <20151031234039.3799.78352@typhoon.lan> (raw)
In-Reply-To: <xmqqsi4rhrmc.fsf@gitster.mtv.corp.google.com>

On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote:
> [...]
> You earlier (re)discovered a good approach to introduce a new
> feature without breaking settings of existing users when we
> discussed a "whitelist".  Since setting the configuration to an
> empty string did not do anything in the old code, an empty string
> was an invalid and non-working setting.  By taking advantage of that
> fact, you safely can say "if you start with an empty that would
> match everything, we'll treat all the others differently from the
> way we did before" if you wanted to.  I think you can follow the
> same principle here.  For example, I can imagine that the rule for
> the "ref-is-hidden" can be updated to:
> 
>  * it now takes refname and also the fullname before stripping the
>    namespace;
> 
>  * hide patterns that is prefixed with '!' means negative, just as
>    before;
> 
>  * (after possibly '!' is stripped), hide patterns that is prefixed
>    with '^', which was invalid before, means check the fullname with
>    namespace prefix, which is a new rule;
> 
>  * otherwise, check the refname after stripping the namespace.
> 
> Such an update would allow a new feature "we now allow you to write
> a pattern that determines the match before stripping the namespace
> prefix" without breaking the existing repositories, no?
> 

Yes. If I understood you correctly, this is exactly what I suggested in
the last paragraph of my previous email (the only difference being that
I suggested to use "/" as full name indicator instead of "^" but that is
just an implementation detail). I will look into implementing this if
that is the way we want to go.

> [...]
> Assuming other namespaces are forks of the same project as yours
> (and otherwise the repository management strategy needs to be
> rethought--using namespace for them is not gaining anything other
> than making your repack more costly), it is likely that all of them
> share a lot of refs that point at the same object (think "tags").
> Do we end up sending a lot of ".have" for exactly the same object
> number of times?  Even though we cannot dedup show_ref() lines that
> talk about concrete refs (because they talk about what refs exist at
> which value, and the sending side would use them to locally reject
> non-ff pushes, for example), ".have" lines that talk about the same
> object can be safely deduped.  This is not directly related to your
> topic of "what should be included in the advertisement", but a
> potentially good thing to fix, if it indeed turns out that we are
> sending a lot of duplicate ".have"s.  A fix in that would make
> things better for everybody (not just namespace users, but those who
> show the ".have" lines from the refs in the repository we borrow
> objects from).

Yes, I think we currently send a lot of duplicate lines. Would be nice
to have that fixed as well.

Note that we do use Git namespaces to store a lot of different but
similar pseudo repositories (i.e. they do not share any history but the
objects have huge similarities). Even though the pseudo repositories
itself are tiny, having the objects in a shared object storage reduces
the size significantly. Other people probably use separate repositories,
combined with something like GIT_OBJECT_DIRECTORY and preciousObjects
for that. Using Git namespaces, however, allows to run `git gc`/`git
repack` without needing to take care of maintaining back references to
the pseudo repositories and, more importantly, allows for storing all
the refs in a single "packed-refs" file which did reduce the size the
size by another factor of 10 in our tests. That massive difference in
size is probably mostly due to the fact that the actual content of each
repository is just some 100 bytes. Not sure if saving that much space
can currently be achieved with any other approach.

  reply	other threads:[~2015-10-31 23:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-26  8:09 [PATCH/RFC] receive-pack: allow for hiding refs outside the namespace Lukas Fleischer
2015-10-26 19:58 ` Junio C Hamano
     [not found]   ` <20151027053916.3030.8259@typhoon.lan>
     [not found]     ` <20151027055911.4877.94179@typhoon.lan>
2015-10-27 14:32       ` Lukas Fleischer
2015-10-27 18:18         ` Junio C Hamano
2015-10-28  7:00           ` Lukas Fleischer
2015-10-28 13:42             ` Jeff King
2015-10-28 15:48             ` Junio C Hamano
2015-10-30 21:31         ` Junio C Hamano
2015-10-30 21:46           ` Jeff King
2015-10-31  9:03             ` Lukas Fleischer
2015-10-28 15:42 ` [PATCH] Allow hideRefs to match " Lukas Fleischer
2015-10-28 16:21   ` Junio C Hamano
2015-10-31  8:49     ` Lukas Fleischer
2015-10-31 17:31       ` Junio C Hamano
2015-10-31 23:40         ` Lukas Fleischer [this message]
2015-11-01 11:27           ` Lukas Fleischer
2015-11-01 18:18             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151031234039.3799.78352@typhoon.lan \
    --to=lfleischer@lfos$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox