From: Derrick Stolee <stolee@gmail•com>
To: Junio C Hamano <gitster@pobox•com>,
Derrick Stolee via GitGitGadget <gitgitgadget@gmail•com>
Cc: git@vger•kernel.org, newren@gmail•com
Subject: Re: [PATCH 2/3] sparse-checkout: add 'clean' command
Date: Wed, 9 Jul 2025 10:39:29 -0400 [thread overview]
Message-ID: <2503c79c-68f3-4ed5-bbfd-3a7af07a89cc@gmail.com> (raw)
In-Reply-To: <xmqqa55etm5g.fsf@gitster.g>
On 7/8/2025 5:20 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail•com> writes:
>
>> From: Derrick Stolee <stolee@gmail•com>
>>
>> When users change their sparse-checkout definitions to add new
>> directories and remove old ones, there may be a few reasons why
>> directories no longer in scope remain (ignored or excluded files still
>> exist, Windows handles are still open, etc.). When these files still
>> exist, the sparse index feature notices that a tracked, but sparse,
>> directory still exists on disk and thus the index expands. This causes a
>> performance hit _and_ the advice printed isn't very helpful. Using 'git
>> clean' isn't enough (generally '-dfx' may be needed) but also this may
>> not be sufficient.
>>
>> Add a new subcommand to 'git sparse-checkout' that removes these
>> tracked-but-sparse directories, including any excluded or ignored files
>
> Are excluded files and ignored files form two separate sets, or are
> they one and the same? Do files that users forgot to add (e.g. new
> source file that would not match any patterns listed in .gitignore)
> and object files left over from the previous compilation (most
> likely match *.o in .gitignore) treated the same way for the purpose
> of determining if the directory that is no longer in the cone can be
> removed?
I think of them as separate in my head because:
* .gitignore is committed to the repo, and is common to all users of
the repo.
* .git/info/exclude is custom to each user, so users are choosing to
ignore extra files that are atypical from most users.
In the monorepo I'm thinking about, .gitignore files are rather small
because all build output has already been redirected out of the
worktree for performance reasons. Thus, _most_ users don't have this
problem. However, some users add extra excludes for things like vim
files and those get leftover, causing invisible (to 'git status') pain.
>> underneath. This is the most extreme method for doing this, but it works
>> when the sparse-checkout is in cone mode and is expected to rescope
>> based on directories, not files.
>>
>> Be sure to add a --dry-run option so users can predict what will be
>> deleted. In general, output the directories that are being removed so
>> users can know what was removed.
>
> Hmph. It would be safer to show not just the directories but which
> excluded files are about to be lost, wouldn't it, especially when
> the user is trying to play safe and see what potential damage they
> are looking at?
> > Also even though ignored files are "ignored and expendable", nobody
> marks their temporary file as "ignored but precious" (yet), so "it
> is listed in .gitignore so we can safely remove it" may not be a
> safe assumption for us to be making (yet). Shouldn't we at least be
> listing these ignored files in --dry-run output, next to those files
> that the user may have forgotten to add?
I considered this, but mostly behind a potential --verbose option to
list the files that are leftover. Much of the design here is that
these _directories_ are out of scope, skipping over any details about
the contained files, so I thought this directory-based output would
communicate enough information.
A curious user may want to know "why are these directories still
around?" and the more verbose output would assist.
>> Note that untracked directories remain. Further, directories that
>> contain staged changes are not deleted. This is a detail that is partly
>> hidden by the implementation which relies on collapsing the index to a
>> sparse index in-memory and only deleting directories that are listed as
>> sparse in the index. If a staged change exists, then that entry is not
>> stored as a sparse tree entry and thus remains on-disk until committed
>> or reset.
>
> Removing untracked directories is a job for "clean -d", so it makes
> sense for this new command not to touch them. Not losing changes
> that have already been added is just a bad as losing new files that
> the user forgot to add, so it does make sense not to remove them.
>
> I wonder if we need "-x" and/or "-X" options "clean" has (and
> perhaps "-d" that is a no-op, as the whole point of this subcommand
> is about removing directories from the working tree) to control its
> operation a bit finer-grained way.
I'm of two minds here.
My first inclination is "we already have 'git clean' for fine-grained
control of removing ignored/excluded files".
My second inclination is "'git clean' would remove these ignored files
even when they are within the sparse-checkout, so that's too big of a
hammer".
There are a lot of ways to filter the files that would be removed,
but I think that in this case most users are wanting a one-command way
to get their sparse-checkout into a better state.
I'm not making any final statements here. I appreciate all of the
thoughts around which options should be default and which should be
hidden behind options.
Thanks,
-Stolee
next prev parent reply other threads:[~2025-07-09 14:39 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 11:19 [PATCH 0/3] sparse-checkout: add 'clean' command Derrick Stolee via GitGitGadget
2025-07-08 11:19 ` [PATCH 1/3] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-07-08 20:49 ` Elijah Newren
2025-07-08 20:59 ` Junio C Hamano
2025-07-08 11:19 ` [PATCH 2/3] sparse-checkout: add 'clean' command Derrick Stolee via GitGitGadget
2025-07-08 12:15 ` Patrick Steinhardt
2025-07-08 20:30 ` Junio C Hamano
2025-07-08 21:20 ` Junio C Hamano
2025-07-09 14:39 ` Derrick Stolee [this message]
2025-07-09 16:46 ` Junio C Hamano
2025-07-08 21:43 ` Elijah Newren
2025-07-09 16:13 ` Derrick Stolee
2025-07-09 17:35 ` Elijah Newren
2025-07-15 13:38 ` Derrick Stolee
2025-07-15 17:17 ` Elijah Newren
2025-07-08 11:19 ` [PATCH 3/3] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-07-08 21:45 ` Elijah Newren
2025-07-08 12:15 ` [PATCH 0/3] sparse-checkout: add 'clean' command Patrick Steinhardt
2025-07-08 20:36 ` Elijah Newren
2025-07-08 22:01 ` Elijah Newren
2025-07-08 23:41 ` Junio C Hamano
2025-07-09 15:41 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 0/8] " Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 1/8] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 2/8] sparse-checkout: add basics of 'clean' command Derrick Stolee via GitGitGadget
2025-08-05 21:32 ` Elijah Newren
2025-09-11 13:37 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 3/8] sparse-checkout: match some 'clean' behavior Derrick Stolee via GitGitGadget
2025-08-05 22:06 ` Elijah Newren
2025-09-11 13:52 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 4/8] dir: add generic "walk all files" helper Derrick Stolee via GitGitGadget
2025-08-05 22:22 ` Elijah Newren
2025-07-17 1:34 ` [PATCH v2 5/8] sparse-checkout: add --verbose option to 'clean' Derrick Stolee via GitGitGadget
2025-08-05 22:22 ` Elijah Newren
2025-09-11 14:06 ` Derrick Stolee
2025-07-17 1:34 ` [PATCH v2 6/8] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 7/8] t: expand tests around sparse merges and clean Derrick Stolee via GitGitGadget
2025-07-17 1:34 ` [PATCH v2 8/8] sparse-checkout: make 'clean' clear more files Derrick Stolee via GitGitGadget
2025-08-06 0:21 ` Elijah Newren
2025-09-11 15:26 ` Derrick Stolee
2025-09-11 16:21 ` Derrick Stolee
2025-08-28 23:22 ` [PATCH v2 0/8] sparse-checkout: add 'clean' command Junio C Hamano
2025-08-29 0:15 ` Elijah Newren
2025-08-29 0:27 ` Junio C Hamano
2025-08-29 21:03 ` Junio C Hamano
2025-08-30 13:41 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 0/7] " Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 1/7] sparse-checkout: remove use of the_repository Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 2/7] sparse-checkout: add basics of 'clean' command Derrick Stolee via GitGitGadget
2025-10-07 22:49 ` Elijah Newren
2025-10-20 14:16 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 3/7] sparse-checkout: match some 'clean' behavior Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 4/7] dir: add generic "walk all files" helper Derrick Stolee via GitGitGadget
2025-09-12 10:30 ` [PATCH v3 5/7] sparse-checkout: add --verbose option to 'clean' Derrick Stolee via GitGitGadget
2025-09-15 18:09 ` Derrick Stolee
2025-09-15 19:12 ` Junio C Hamano
2025-09-16 2:00 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 6/7] sparse-index: point users to new 'clean' action Derrick Stolee via GitGitGadget
2025-10-07 22:53 ` Elijah Newren
2025-10-20 14:17 ` Derrick Stolee
2025-09-12 10:30 ` [PATCH v3 7/7] t: expand tests around sparse merges and clean Derrick Stolee via GitGitGadget
2025-09-12 16:12 ` [PATCH v3 0/7] sparse-checkout: add 'clean' command Junio C Hamano
2025-09-26 13:40 ` Derrick Stolee
2025-09-26 18:58 ` Elijah Newren
2025-10-07 23:07 ` Elijah Newren
2025-10-20 14:25 ` Derrick Stolee
2025-10-20 14:24 ` [PATCH 8/8] sparse-index: improve advice message instructions Derrick Stolee
2025-10-20 16:29 ` Junio C Hamano
2025-10-24 2:22 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2503c79c-68f3-4ed5-bbfd-3a7af07a89cc@gmail.com \
--to=stolee@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitgitgadget@gmail$(echo .)com \
--cc=gitster@pobox$(echo .)com \
--cc=newren@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox