public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Lidong Yan <yldhome2d2@gmail•com>
Cc: git@vger•kernel.org,  Lidong Yan <502024330056@smail•nju.edu.cn>,
	 Kai Koponen <kaikoponen@google•com>
Subject: Re: [PATCH 0/2] bloom: use bloom filter given multiple pathspec
Date: Wed, 25 Jun 2025 10:32:10 -0700	[thread overview]
Message-ID: <xmqq7c0zviat.fsf@gitster.g> (raw)
In-Reply-To: <20250625125541.3048632-1-502024330056@smail.nju.edu.cn> (Lidong Yan's message of "Wed, 25 Jun 2025 20:55:39 +0800")

Lidong Yan <yldhome2d2@gmail•com> writes:

> git won't use bloom filter for multiple pathspec, which makes the command

Let's get the terminology straight.  A pathspec consists of one or
more pathspec elements (or pathspec items).

Also, "git won't" is overly general.  The series title shares the
same issue ("given multiple pathspec" does not even hint that this
is about revision traversal---you are not making filter used with
pathspec with more than one element in other code paths).

Perhaps like:

    The revision traversal limited by pathspec has optimization when
    the pathspec has only one element, it does not use any pathspec
    magic (other than literal), and there is no wildcard.

    While it is much harder to lift the latter two limitations,
    supporting a pathspec with multiple elements is relatively easy.
    Just make sure we hash each of them separately and ask the bloom
    filter about them, and if we see none of them can possibly be
    affected by the commit, we can skip without tree comparison.

or something along that line?

>   git log -- file1 file2
> significantly slower than
>   git log -- file1 && git log -- file2
>
> This issue is raised by Kai Koponen at
>   https://lore.kernel.org/git/CADYQcGqaMC=4jgbmnF9Q11oC11jfrqyvH8EuiRRHytpMXd4wYA@mail.gmail.com/
>
> To fix this, revs->bloom_keys[] needs to become an array of bloom_keys[],
> one for each literal pathspec element. For convenience, first commit
> creates a new struct bloom_keyvec to hold all bloom keys for a single
> pathspec. The second commit add for loop to check if any pathspec's keyvec
> is contained in a commit's bloom filter, along with code that initialize
> destory and test multiple pathspec bloom keyvecs.

It is nice to outline an approach to the solution one day, and see
an almost exact implementation of it appear on the list a few days
later.  I wish all the development would go like this ;-)

>  bloom.c              |  47 +++++++++++++++++
>  bloom.h              |  14 +++++
>  revision.c           | 121 ++++++++++++++++++++++++-------------------
>  revision.h           |   5 +-
>  t/t4216-log-bloom.sh |  10 ++--

Can we have a set of real tests to make sure that the updated filter
code still identifies commits that touch the files without false
negatives?  False positives are OK as we will follow them with real
tree comparison to determine what exactly got changed, but false
negatives are absolute no-no.

Testing to see that the filter code path is activated is much less
interesting than the filter code path still functions correctly with
these changes presented here.  I have a feeling that with the
changes to the test in this series, you wouldn't even find a bug
where you simply added subpaths for all pathspec elements into a
single array and use the original "bloom has to say 'possibly yes'
to all array elements" logic (which would incorrectly require that
both file1 and file2 must be modified).

Thanks.

  parent reply	other threads:[~2025-06-25 17:32 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-25 12:55 [PATCH 0/2] bloom: use bloom filter given multiple pathspec Lidong Yan
2025-06-25 12:55 ` [PATCH 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-25 17:43   ` Junio C Hamano
2025-06-26  3:44     ` Lidong Yan
2025-06-25 12:55 ` [PATCH 2/2] bloom: enable multiple pathspec bloom keys Lidong Yan
2025-06-27 13:50   ` Junio C Hamano
2025-06-27 14:24     ` Lidong Yan
2025-06-27 18:09       ` Junio C Hamano
2025-07-01  5:52     ` Lidong Yan
2025-07-01 15:19       ` Junio C Hamano
2025-07-02  7:14         ` Lidong Yan
2025-07-02 15:48           ` Junio C Hamano
2025-07-03  1:52             ` Lidong Yan
2025-07-04 12:09             ` Lidong Yan
2025-07-01  8:50     ` SZEDER Gábor
2025-07-01 11:40       ` Lidong Yan
2025-07-01 15:43       ` Junio C Hamano
2025-06-27 20:39   ` Junio C Hamano
2025-06-28  2:54     ` Lidong Yan
2025-06-25 17:32 ` Junio C Hamano [this message]
2025-06-26  3:34   ` [PATCH 0/2] bloom: use bloom filter given multiple pathspec Lidong Yan
2025-06-26 14:15     ` Junio C Hamano
2025-06-27  6:21 ` [PATCH v2 0/2] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-06-28  4:21   ` [PATCH v3 " Lidong Yan
2025-07-04 11:14     ` [PATCH v4 0/4] " Lidong Yan
2025-07-04 11:14       ` [PATCH v4 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-04 11:14       ` [PATCH v4 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-04 11:14       ` [PATCH v4 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-07 11:35         ` Derrick Stolee
2025-07-07 14:14           ` Lidong Yan
2025-07-04 11:14       ` [PATCH v4 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-07 11:43         ` Derrick Stolee
2025-07-07 14:18           ` Lidong Yan
2025-07-07 15:14           ` Junio C Hamano
2025-07-10  8:48       ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements " Lidong Yan
2025-07-10  8:48         ` [PATCH v5 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-10  8:48         ` [PATCH v5 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-10  8:48         ` [PATCH v5 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-10 16:17           ` Junio C Hamano
2025-07-11 12:46             ` Lidong Yan
2025-07-11 15:06               ` Junio C Hamano
2025-07-10  8:48         ` [PATCH v5 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-10 13:51           ` [PATCH v5.1 3.5/4] revision: make helper for pathspec to bloom key Derrick Stolee
2025-07-10 15:42             ` Lidong Yan
2025-07-10 13:55           ` [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision Derrick Stolee
2025-07-10 15:49             ` Lidong Yan
2025-07-10 13:49         ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-12  9:35         ` [PATCH v6 0/5] " Lidong Yan
2025-07-12  9:35           ` [PATCH v6 1/5] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-12  9:35           ` [PATCH v6 2/5] bloom: rename function operates on bloom_key Lidong Yan
2025-07-12  9:35           ` [PATCH v6 3/5] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-12  9:35           ` [PATCH v6 4/5] revision: make helper for pathspec to bloom keyvec Lidong Yan
2025-07-12  9:35           ` [PATCH v6 5/5] To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible Lidong Yan
2025-07-12  9:47             ` Lidong Yan
2025-07-12  9:51               ` [PATCH v6 5/5] bloom: optimize multiple pathspec items in revision Lidong Yan
2025-07-14 16:51                 ` Derrick Stolee
2025-07-14 17:01                   ` Junio C Hamano
2025-07-15  1:37                     ` Lidong Yan
2025-07-15  2:56                       ` [RESEND][PATCH " Lidong Yan
2025-07-14 16:53           ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-14 17:02             ` Junio C Hamano
2025-07-15  1:34             ` Lidong Yan
2025-07-15  2:48               ` Derrick Stolee
2025-07-15 15:09                 ` Junio C Hamano
2025-06-28  4:21   ` [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-02 15:08     ` Patrick Steinhardt
2025-07-02 15:49       ` Lidong Yan
2025-07-02 18:28       ` Junio C Hamano
2025-07-03  1:41         ` Lidong Yan
2025-06-28  4:21   ` [PATCH v3 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-06-27  6:21 ` [PATCH v2 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-27  6:21 ` [PATCH v2 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq7c0zviat.fsf@gitster.g \
    --to=gitster@pobox$(echo .)com \
    --cc=502024330056@smail$(echo .)nju.edu.cn \
    --cc=git@vger$(echo .)kernel.org \
    --cc=kaikoponen@google$(echo .)com \
    --cc=yldhome2d2@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox