public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Elijah Newren <newren@gmail•com>
Cc: Elijah Newren via GitGitGadget <gitgitgadget@gmail•com>,
	git@vger•kernel.org,
	Matthew John Cheetham <mjcheetham@outlook•com>
Subject: Re: [PATCH] fsck: snapshot default refs before object walk
Date: Wed, 14 Jan 2026 17:23:51 -0500	[thread overview]
Message-ID: <20260114222351.GA1014423@coredump.intra.peff.net> (raw)
In-Reply-To: <CABPp-BGqiM8fmirgdqumRNfzWediC5v_uZ9qHjntTqPqABDhnA@mail.gmail.com>

On Tue, Jan 06, 2026 at 03:36:57PM -0800, Elijah Newren wrote:

> >   Side note: I do not think I have ever run fsck with refs on the
> >   command-line. It is not like it saves you any time! Most of the
> >   expense comes from opening up and verifying the objects in the first
> >   step, not from looking at ref reachability.
> 
> Not to mention it produces spurious "dangling" object warnings,
> because while the objects might be reachable, they aren't necessarily
> reachable from the particular subset you specified on the command
> line.  I wonder if no one ever noticed that because it's such a
> useless mode; I only noticed it because you pointed out how Matthew
> and I overlooked races with command-line arguments.

Yep. It would be nice to get rid of it, but of course I worry about
missing some obscure use case that somebody relies on. Anyway, way out
of scope for this patch.

> > That did make me wonder a bit about the other fields in "struct
> > reference" (which your snapshot just throws away). But it looks like
> > fsck_handle_ref() only cares about the name and oid, so it is OK.
> 
> Junio suggested the same thing, although he also suggested we might
> want to snapshot some reflog information at the same time, which then
> wouldn't make sense to be using a struct reference.  Even though I'm
> not implementing per-reflog snapshotting, I left a comment in the code
> about it so I think it made sense to just create my own data structure
> with just the name and oid.

I do wonder how expensive it would be to just snapshot all of the
reflogs. It is a shame, because 99% of the time entry N will be
reachable from N+1 (and thus does not need to be stored), but we don't
know that until we traverse.

So we are potentially storing the oid of every reachable object in the
repository in memory as a snapshot. But...isn't that exactly what we'll
do anyway when we traverse, in the form of "struct object"?

Anyway, I can live with your timestamp-based hack for snapshotting the
reflogs.

> > There are a few other related interesting cases, too:
> >
> >   - We may use the index file for connectivity, as well. It suffers from
> >     the same race, and would benefit from a snapshot.
> 
> I left a couple TODO comments about this, so that those who are
> interested/motivated can extend the snapshotting further.
> 
> >   - In get_default_heads() we also look at worktree HEADs. Those have
> >     the same race (their normal refs we don't consider here, because
> >     they were already handled by the overall ref iteration).
> 
> I handled these in my newer version, since handling them is pretty
> similar to handling command line arguments.

Both sound like reasonable directions to me. I'll take a look at your
latest version.

-Peff

  reply	other threads:[~2026-01-14 22:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-29 19:12 [PATCH] fsck: snapshot default refs before object walk Elijah Newren via GitGitGadget
2025-12-30  0:45 ` Junio C Hamano
2026-01-06 23:19   ` Elijah Newren
2026-01-09 13:11     ` Matthew John Cheetham
2026-01-02  5:49 ` Jeff King
2026-01-06 23:36   ` Elijah Newren
2026-01-14 22:23     ` Jeff King [this message]
2026-01-07  1:29 ` [PATCH v2] " Elijah Newren via GitGitGadget
2026-01-09 17:49   ` [PATCH v3] " Elijah Newren via GitGitGadget
2026-01-11  4:01     ` Junio C Hamano
2026-01-14 22:34     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260114222351.GA1014423@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=mjcheetham@outlook$(echo .)com \
    --cc=newren@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox