public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks•im>
To: Justin Tobler <jltobler@gmail•com>
Cc: git@vger•kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste•net>,
	Karthik Nayak <karthik.188@gmail•com>,
	K Jayatheerth <jayatheerthkulkarni2005@gmail•com>,
	ryenus@gmail•com, Junio C Hamano <gitster@pobox•com>
Subject: Re: [PATCH 1/2] BreakingChanges: announce switch to "reftable" format
Date: Thu, 3 Jul 2025 07:00:21 +0200	[thread overview]
Message-ID: <aGYOZVyaR_OYIhtl@pks.im> (raw)
In-Reply-To: <q6zyvqpyxobtp65ptrmkdg3kvc2plxmsltaurqf52hglitikir@5p5jpcqc577o>

On Wed, Jul 02, 2025 at 12:17:50PM -0500, Justin Tobler wrote:
> On 25/07/02 12:14PM, Patrick Steinhardt wrote:
> > diff --git a/Documentation/BreakingChanges.adoc b/Documentation/BreakingChanges.adoc
> > index c6bd94986c5..c96b5319cdd 100644
> > --- a/Documentation/BreakingChanges.adoc
> > +++ b/Documentation/BreakingChanges.adoc
> > @@ -118,6 +118,45 @@ Cf. <2f5de416-04ba-c23d-1e0b-83bb655829a7@zombino•com>,
> >  <20170223155046.e7nxivfwqqoprsqj@LykOS•localdomain>,
> >  <CA+EOSBncr=4a4d8n9xS4FNehyebpmX8JiUwCsXD47EQDE+DiUQ@mail•gmail.com>.
> >  
> > +* The default storage format for references in newly created repositories will
> > +  be changed from "files" to "reftable". The "reftable" format provides
> > +  multiple advantages over the "files" format:
> > ++
> > +  ** It is impossible to store two references that only differ in casing on
> > +     case-insensitive filesystems with the "files" format. This issue is
> > +     especially common on Windows, but also on older versions of macOS. As the
> > +     "reftable" backend does not use filesystem paths anymore to encode
> > +     reference names this problem goes away.
> 
> I believe even modern macOS by default uses a case-insensitive
> file-system. Maybe we should instead say:
> 
>   This limitation is common on Windows and macOS platforms.

Okay, thanks for the clarification. I thought recent versions of macOS
were case-sensitive by default.

> > +  ** Similarly, macOS normalizes path names that contain unicode characters,
> > +     which has the consequence that you cannot store two names with unicode
> > +     characters that are encoded differently with the "files" backend. Again,
> > +     this is not an issue with the "reftable" backend.
> > +  ** Deleting references with the "files" backend requires Git to rewrite the
> > +     complete "packed-refs" file. In large repositories with many references
> > +     this file can easily be dozens of megabytes in size, in extreme cases it
> > +     may be gigabytes. The "reftable" backend uses tombstone markers for
> > +     deleted references and thus does not have to rewrite all of its data.
> > +  ** Repository housekeeping with the "files" backend typically performs
> > +     all-into-one repacks of references. This can be quite expensive, and
> > +     consequently housekeeping is a tradeoff between the number of loose
> > +     references that accumulate and slow down operations that read references,
> > +     and compressing those loose references into the "packed-refs" file. The
> > +     "reftable" backend uses geometric compaction after every write, which
> > +     amortizes costs and ensures that the backend is always in a
> > +     well-maintained state.
> > +  ** Operations that write multiple references at once are not atomic with the
> > +     "files" backend. Consequently, Git may see in-between states when it reads
> > +     references while a reference transaction is in the process of being
> > +     committed to disk.
> > +  ** Writing many references at once is slow with the "files" backend because
> > +     every reference is created as a separate file. The "reftable" backend
> > +     significantly outperforms the "files" backend by multiple orders of
> > +     magnitude.
> 
> The examples above do a good job at explaining individual technical
> benefits. I do wonder if we should include a more general statement
> aimed at users as to why the change to reftables is beneficial. Maybe
> something like:
> 
>   The reftables backend addresses several performance concerns as the
>   number of references scale in a repository. 

I think this would be a bit too handwavy. I'd rather want to point out
the specific cases where we know it to perform better.

Patrick

  reply	other threads:[~2025-07-03  5:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 10:14 [PATCH 0/2] Add reftable by default as a breaking change Patrick Steinhardt
2025-07-02 10:14 ` [PATCH 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt
2025-07-02 17:03   ` Junio C Hamano
2025-07-02 21:21     ` brian m. carlson
2025-07-03  4:43       ` Patrick Steinhardt
2025-07-03  4:43     ` Patrick Steinhardt
2025-07-02 17:17   ` Justin Tobler
2025-07-03  5:00     ` Patrick Steinhardt [this message]
2025-07-02 10:14 ` [PATCH 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt
2025-07-03  6:15 ` [PATCH v2 0/2] Add reftable by default as a breaking change Patrick Steinhardt
2025-07-03  6:15   ` [PATCH v2 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt
2025-07-03 10:54     ` Karthik Nayak
2025-07-03 11:42       ` Patrick Steinhardt
2025-07-03 12:24         ` Karthik Nayak
2025-07-03 13:08           ` Patrick Steinhardt
2025-07-03  6:15   ` [PATCH v2 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt
2025-07-07  5:37   ` [PATCH v2 0/2] Add reftable by default as a breaking change Junio C Hamano
2025-07-04  9:42 ` [PATCH v3 " Patrick Steinhardt
2025-07-04  9:42   ` [PATCH v3 1/2] BreakingChanges: announce switch to "reftable" format Patrick Steinhardt
2025-07-04  9:42   ` [PATCH v3 2/2] setup: use "reftable" format when experimental features are enabled Patrick Steinhardt
2025-07-04 13:14   ` [PATCH v3 0/2] Add reftable by default as a breaking change Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGYOZVyaR_OYIhtl@pks.im \
    --to=ps@pks$(echo .)im \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=jayatheerthkulkarni2005@gmail$(echo .)com \
    --cc=jltobler@gmail$(echo .)com \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=ryenus@gmail$(echo .)com \
    --cc=sandals@crustytoothpaste$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox