public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail•com>
To: Junio C Hamano <gitster@pobox•com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering
Date: Wed, 18 Mar 2026 09:16:14 -0400	[thread overview]
Message-ID: <443e5b25-46a1-4990-96b3-122e66d81828@gmail.com> (raw)
In-Reply-To: <xmqqms06hzfv.fsf@gitster.g>

On 3/17/2026 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail•com> writes:
> 
>> From: Derrick Stolee <stolee@gmail•com>
>>
>> Previously, walk_objects_by_path() silently ignored pathspecs containing
>> wildcards or magic by clearing them. This caused all blobs to be
>> downloaded regardless of the given pathspec. Wildcard pathspecs like
>> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
>> during 'git backfill').
>>
>> Support wildcard pathspecs by making three changes:
>>
>>  1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>>     pathspec has no wildcards or magic, set this flag and use the
>>     existing fast-path prefix matching in add_tree_entries(). When
>>     wildcards are present, skip that block since prefix matching
>>     cannot handle glob patterns.
>>
>>  2. Disable revision-level commit pruning (revs->prune = 0) for
>>     wildcard pathspecs. The revision walk uses the pathspec to filter
>>     commits via TREESAME detection. For exact prefix pathspecs this
>>     works well, but wildcard pathspecs may fail to match through
>>     TREESAME because fnmatch with WM_PATHNAME does not cross directory
>>     boundaries. Disabling pruning ensures all commits are visited and
>>     their trees are available for the path-walk to filter.
> 
> Hmph, I wonder how significant an impact does it have on the
> performance that we have to disable pruning here.  With the bog
> standard tree traversal, wouldn't tree_entry_interesting() already
> be capable of doing this, even with fnmatch / WM_PATHNAME ?

I will explore what's possible here and see what I can do.

>>  3. Add a match_pathspec() check in walk_path() to filter out blobs
>>     whose full path does not match the pathspec. This provides the
>>     actual blob-level filtering for wildcard pathspecs.
>>
>> Signed-off-by: Derrick Stolee <stolee@gmail•com>
>> Co-authored-by: Copilot <223556219+Copilot@users•noreply.github.com>
> 
> The latter person cannot sign DCO or vouch for the origin of what
> they have written in this patch, can they?
No they cannot. Sorry for this error.

Thanks,
-Stolee


  reply	other threads:[~2026-03-18 13:16 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17  0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17  0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-17 21:52   ` Junio C Hamano
2026-03-17  0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-17  0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01   ` Junio C Hamano
2026-03-18 15:37   ` Kristoffer Haugsbakk
2026-03-23  0:31     ` Derrick Stolee
2026-03-19  9:54   ` Patrick Steinhardt
2026-03-23  0:35     ` Derrick Stolee
2026-03-17  0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10   ` Junio C Hamano
2026-03-18 13:15     ` Derrick Stolee
2026-03-19  9:54       ` Patrick Steinhardt
2026-03-19  9:55   ` Patrick Steinhardt
2026-03-19 10:15   ` Patrick Steinhardt
2026-03-23  0:47     ` Derrick Stolee
2026-03-17  0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-17 22:19   ` Junio C Hamano
2026-03-18 13:16     ` Derrick Stolee [this message]
2026-03-23  1:33       ` Derrick Stolee
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
2026-03-19  9:54 ` Patrick Steinhardt
2026-03-19 12:59   ` Derrick Stolee
2026-03-20  7:35     ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-24  7:59     ` Patrick Steinhardt
2026-03-26 12:55       ` Derrick Stolee
2026-03-23 11:40   ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-24  7:59     ` Patrick Steinhardt
2026-03-26 12:58       ` Derrick Stolee
2026-03-23 11:40   ` [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-23 15:29     ` Junio C Hamano
2026-03-23 20:39       ` Derrick Stolee
2026-03-26 15:14   ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-27  7:07     ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=443e5b25-46a1-4990-96b3-122e66d81828@gmail.com \
    --to=stolee@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=gitster@pobox$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox