From: Jeff King <peff@peff•net>
To: Toon Claes <toon@iotcl•com>
Cc: git@vger•kernel.org, Karthik Nayak <karthik.188@gmail•com>,
Anders Kaseorg <andersk@mit•edu>
Subject: Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
Date: Fri, 28 Nov 2025 15:55:14 -0500 [thread overview]
Message-ID: <20251128205514.GA605489@coredump.intra.peff.net> (raw)
In-Reply-To: <20251128-toon-big-endian-ci-v1-1-80da0f629c1e@iotcl.com>
On Fri, Nov 28, 2025 at 05:37:13PM +0100, Toon Claes wrote:
> git-last-modified(1) uses a scratch bitmap to keep track of paths that
> have been changed between commits. To avoid reallocating a bitmap on
> each call of process_parent(), the scratch bitmap is kept and reused.
> Although, it seems an incorrect length is passed to memset(3).
>
> `struct bitmap` uses `eword_t` to for internal storage. This type is
> typedef'd to uint64_t. To fully zero the memory used by the bitmap,
> multiply the length (saved in `struct bitmap::word_alloc`) by the size
> of `eword_t`.
Good catch! When I was looking for casts that could be the culprit, I
didn't think about the implicit one we get through the void pointer of
memset().
> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index b0ecbdc540..cc5fd2e795 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -327,7 +327,7 @@ static void process_parent(struct last_modified *lm,
> if (!(parent->object.flags & PARENT1))
> active_paths_free(lm, parent);
>
> - memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
> + memset(lm->scratch->words, 0x0, lm->scratch->word_alloc * sizeof(eword_t));
> diff_queue_clear(&diff_queued_diff);
> }
I think this patch makes sense as the most obvious and immediate fix.
But thinking on how we might have avoided this bug:
- We have macros like ALLOC_ARRAY() and COPY_ARRAY() that
automatically multiply the array length by the size of each element
(by looking at the type of the array). We could in theory have a
helper like:
MEMSET_ARRAY(lm->scratch->words, 0x0, lm->scratch->word_alloc);
that would have made this hard to get wrong. But that's actually a
bit of a funny interface, because memset is inherently byte-oriented
under the hood. So we are not setting each element to 0x0, but
rather each byte. For a value of 0x0, that is the same thing. But if
you chose, say "0x1", it is not.
So it would probably have to be limited to something like:
CLEAR_ARRAY(lm->scratch->words, lm->scratch->word_alloc);
which I'd guess would cover most memset cases. But this is getting
specific enough that maybe the macro is making things more confusing
rather than less.
- It's a little gross that we are reaching inside a "struct bitmap" in
the first place, as it's a mostly opaque type. And the code here has
to know that the alloc field is sized in eword_t's, not in bytes.
It feels like there should be a bitmap_clear() function. Its
implementation would also have to remember to multiply by
sizeof(eword_t), but at least it would be encapsulated.
I doubt the leaky abstraction matters that much, though. It seems
unlikely that we would change it (and if we did, we'd perhaps give
the field a new name).
In the same vein, probably using "sizeof(lm->scratch->words)" is
better than "sizeof(eword_t)". But again, I find it an unlikely
detail for us to catch under the hood.
-Peff
next prev parent reply other threads:[~2025-11-28 20:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-28 16:37 [PATCH] last-modified: fix bug caused by inproper initialized memory Toon Claes
2025-11-28 20:55 ` Jeff King [this message]
2025-11-28 22:20 ` Anders Kaseorg
2025-11-29 10:50 ` Jeff King
2025-12-08 11:47 ` Toon Claes
2025-12-08 20:15 ` Jeff King
2025-12-08 22:42 ` Junio C Hamano
2025-11-29 2:01 ` Junio C Hamano
2025-11-29 2:11 ` Junio C Hamano
2025-11-29 9:38 ` Toon Claes
2025-12-08 11:46 ` [PATCH v2] last-modified: fix use of uninitialized memory Toon Claes
2025-12-08 13:26 ` Junio C Hamano
2025-12-09 8:43 ` Toon Claes
2025-12-09 12:18 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251128205514.GA605489@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=andersk@mit$(echo .)edu \
--cc=git@vger$(echo .)kernel.org \
--cc=karthik.188@gmail$(echo .)com \
--cc=toon@iotcl$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox