From: Derrick Stolee <stolee@gmail•com>
To: Lidong Yan <yldhome2d2@gmail•com>
Cc: 502024330056@smail•nju.edu.cn, git@vger•kernel.org, gitster@pobox•com
Subject: Re: [PATCH v4 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec
Date: Mon, 7 Jul 2025 07:35:42 -0400 [thread overview]
Message-ID: <65dc80f9-a91c-463b-9c6b-cb20d293432b@gmail.com> (raw)
In-Reply-To: <20250704111437.2660251-4-502024330056@smail.nju.edu.cn>
On 7/4/2025 7:14 AM, Lidong Yan wrote:
> The revision traversal limited by pathspec has optimization when
> the pathspec has only one element. To support optimization for
> multiple pathspec items, we need to modify the data structures
> in struct rev_info.
You are correct that the revision-walking abandons bloom filters
when there are multiple pathspecs, and fixing this is a valuable
effort.
The need for this change is subtle and could use some extra context
to be sure reviewers understand:
This change is writing over some code that was created in
c525ce95b46 (commit-graph: check all leading directories in changed
path Bloom filters, 2020-07-01) to allow storing multiple bloom
keys during the revision walk. The multiple keys are focusing on
multiple path components of the literal pathspec.
The point is that after the initialization, the bloom key array is
used directly as a filter for reporting TREESAME commits: a commit
is automatically reported as TREESAME to its first parent if any
bloom key results in a "No, definitely not changed" result with
that commit's bloom filter.
The reason we need a new data structure is that we need to adjust
the conditionals.
BEFORE: "NOT TREESAME if there EXISTS a bloom key that reports NO"
AFTER: "NOT TREESAME if FOR EVERY pathspec there EXISTS a bloom key
that reports NO."
This "FOR EVERY" condition makes it impossible to use a flat array
of bloom keys for multiple pathspecs, justifying this change.
What is further confusing here is that we already have logic that
deals with arrays of bloom keys, so I expected that the vector was
the single structure storing a list of those arrays. Instead, the
vector is replacing the array itself. This is made clear by using
the vector immediately in the existing implementation.
> +struct bloom_keyvec *bloom_keyvec_new(size_t count)
> +{
> + struct bloom_keyvec *vec;
> + size_t sz = sizeof(struct bloom_keyvec);
> + sz += count * sizeof(struct bloom_key);
> + vec = (struct bloom_keyvec *)xcalloc(1, sz);
You could use CALLOC_ARRAY() to simplify this and drop
the 'sz' variable.
> + vec->count = count;
> + return vec;
> +}
> +
> +void bloom_keyvec_free(struct bloom_keyvec *vec)
> +{
> + if (!vec)
> + return;
> + for (size_t nr = 0; nr < vec->count; nr++)
> + bloom_key_clear(&vec->key[nr]);
> + free(vec);
> +}
> +
> static int pathmap_cmp(const void *hashmap_cmp_fn_data UNUSED,
> const struct hashmap_entry *eptr,
> const struct hashmap_entry *entry_or_key,
> @@ -541,6 +560,18 @@ int bloom_filter_contains(const struct bloom_filter *filter,
> return 1;
> }
>
> +int bloom_filter_contains_vec(const struct bloom_filter *filter,
> + const struct bloom_keyvec *vec,
> + const struct bloom_filter_settings *settings)
> +{
> + int ret = 1;
> +
> + for (size_t nr = 0; ret > 0 && nr < vec->count; nr++)
> + ret = bloom_filter_contains(filter, &vec->key[nr], settings);
> +
> + return ret;
> +}
This implementation is where the subtle detail comes in. Might be worth
a comment to say "if any key in this list is not contained in the filter,
then the filter doesn't match this vector."
next prev parent reply other threads:[~2025-07-07 11:35 UTC|newest]
Thread overview: 72+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-25 12:55 [PATCH 0/2] bloom: use bloom filter given multiple pathspec Lidong Yan
2025-06-25 12:55 ` [PATCH 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-25 17:43 ` Junio C Hamano
2025-06-26 3:44 ` Lidong Yan
2025-06-25 12:55 ` [PATCH 2/2] bloom: enable multiple pathspec bloom keys Lidong Yan
2025-06-27 13:50 ` Junio C Hamano
2025-06-27 14:24 ` Lidong Yan
2025-06-27 18:09 ` Junio C Hamano
2025-07-01 5:52 ` Lidong Yan
2025-07-01 15:19 ` Junio C Hamano
2025-07-02 7:14 ` Lidong Yan
2025-07-02 15:48 ` Junio C Hamano
2025-07-03 1:52 ` Lidong Yan
2025-07-04 12:09 ` Lidong Yan
2025-07-01 8:50 ` SZEDER Gábor
2025-07-01 11:40 ` Lidong Yan
2025-07-01 15:43 ` Junio C Hamano
2025-06-27 20:39 ` Junio C Hamano
2025-06-28 2:54 ` Lidong Yan
2025-06-25 17:32 ` [PATCH 0/2] bloom: use bloom filter given multiple pathspec Junio C Hamano
2025-06-26 3:34 ` Lidong Yan
2025-06-26 14:15 ` Junio C Hamano
2025-06-27 6:21 ` [PATCH v2 0/2] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Lidong Yan
2025-06-28 4:21 ` [PATCH v3 " Lidong Yan
2025-07-04 11:14 ` [PATCH v4 0/4] " Lidong Yan
2025-07-04 11:14 ` [PATCH v4 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-04 11:14 ` [PATCH v4 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-04 11:14 ` [PATCH v4 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-07 11:35 ` Derrick Stolee [this message]
2025-07-07 14:14 ` Lidong Yan
2025-07-04 11:14 ` [PATCH v4 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-07 11:43 ` Derrick Stolee
2025-07-07 14:18 ` Lidong Yan
2025-07-07 15:14 ` Junio C Hamano
2025-07-10 8:48 ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements " Lidong Yan
2025-07-10 8:48 ` [PATCH v5 1/4] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-10 8:48 ` [PATCH v5 2/4] bloom: rename function operates on bloom_key Lidong Yan
2025-07-10 8:48 ` [PATCH v5 3/4] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-10 16:17 ` Junio C Hamano
2025-07-11 12:46 ` Lidong Yan
2025-07-11 15:06 ` Junio C Hamano
2025-07-10 8:48 ` [PATCH v5 4/4] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-07-10 13:51 ` [PATCH v5.1 3.5/4] revision: make helper for pathspec to bloom key Derrick Stolee
2025-07-10 15:42 ` Lidong Yan
2025-07-10 13:55 ` [PATCH v5.1 4/4] bloom: optimize multiple pathspec items in revision Derrick Stolee
2025-07-10 15:49 ` Lidong Yan
2025-07-10 13:49 ` [PATCH v5 0/4] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-12 9:35 ` [PATCH v6 0/5] " Lidong Yan
2025-07-12 9:35 ` [PATCH v6 1/5] bloom: add test helper to return murmur3 hash Lidong Yan
2025-07-12 9:35 ` [PATCH v6 2/5] bloom: rename function operates on bloom_key Lidong Yan
2025-07-12 9:35 ` [PATCH v6 3/5] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-12 9:35 ` [PATCH v6 4/5] revision: make helper for pathspec to bloom keyvec Lidong Yan
2025-07-12 9:35 ` [PATCH v6 5/5] To enable optimize multiple pathspec items in revision traversal, return 0 if all pathspec item is literal in forbid_bloom_filters(). Add for loops to initialize and check each pathspec item's bloom_keyvec when optimization is possible Lidong Yan
2025-07-12 9:47 ` Lidong Yan
2025-07-12 9:51 ` [PATCH v6 5/5] bloom: optimize multiple pathspec items in revision Lidong Yan
2025-07-14 16:51 ` Derrick Stolee
2025-07-14 17:01 ` Junio C Hamano
2025-07-15 1:37 ` Lidong Yan
2025-07-15 2:56 ` [RESEND][PATCH " Lidong Yan
2025-07-14 16:53 ` [PATCH v6 0/5] bloom: enable bloom filter optimization for multiple pathspec elements in revision traversal Derrick Stolee
2025-07-14 17:02 ` Junio C Hamano
2025-07-15 1:34 ` Lidong Yan
2025-07-15 2:48 ` Derrick Stolee
2025-07-15 15:09 ` Junio C Hamano
2025-06-28 4:21 ` [PATCH v3 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-07-02 15:08 ` Patrick Steinhardt
2025-07-02 15:49 ` Lidong Yan
2025-07-02 18:28 ` Junio C Hamano
2025-07-03 1:41 ` Lidong Yan
2025-06-28 4:21 ` [PATCH v3 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
2025-06-27 6:21 ` [PATCH v2 1/2] bloom: replace struct bloom_key * with struct bloom_keyvec Lidong Yan
2025-06-27 6:21 ` [PATCH v2 2/2] bloom: optimize multiple pathspec items in revision traversal Lidong Yan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=65dc80f9-a91c-463b-9c6b-cb20d293432b@gmail.com \
--to=stolee@gmail$(echo .)com \
--cc=502024330056@smail$(echo .)nju.edu.cn \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=yldhome2d2@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox