public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Toon Claes <toon@iotcl•com>
Cc: git@vger•kernel.org, Karthik Nayak <karthik.188@gmail•com>,
	Anders Kaseorg <andersk@mit•edu>
Subject: Re: [PATCH] last-modified: fix bug caused by inproper initialized memory
Date: Fri, 28 Nov 2025 15:55:14 -0500	[thread overview]
Message-ID: <20251128205514.GA605489@coredump.intra.peff.net> (raw)
In-Reply-To: <20251128-toon-big-endian-ci-v1-1-80da0f629c1e@iotcl.com>

On Fri, Nov 28, 2025 at 05:37:13PM +0100, Toon Claes wrote:

> git-last-modified(1) uses a scratch bitmap to keep track of paths that
> have been changed between commits. To avoid reallocating a bitmap on
> each call of process_parent(), the scratch bitmap is kept and reused.
> Although, it seems an incorrect length is passed to memset(3).
> 
> `struct bitmap` uses `eword_t` to for internal storage. This type is
> typedef'd to uint64_t. To fully zero the memory used by the bitmap,
> multiply the length (saved in `struct bitmap::word_alloc`) by the size
> of `eword_t`.

Good catch! When I was looking for casts that could be the culprit, I
didn't think about the implicit one we get through the void pointer of
memset().

> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
> index b0ecbdc540..cc5fd2e795 100644
> --- a/builtin/last-modified.c
> +++ b/builtin/last-modified.c
> @@ -327,7 +327,7 @@ static void process_parent(struct last_modified *lm,
>  	if (!(parent->object.flags & PARENT1))
>  		active_paths_free(lm, parent);
>  
> -	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc);
> +	memset(lm->scratch->words, 0x0, lm->scratch->word_alloc * sizeof(eword_t));
>  	diff_queue_clear(&diff_queued_diff);
>  }

I think this patch makes sense as the most obvious and immediate fix.
But thinking on how we might have avoided this bug:

  - We have macros like ALLOC_ARRAY() and COPY_ARRAY() that
    automatically multiply the array length by the size of each element
    (by looking at the type of the array). We could in theory have a
    helper like:

      MEMSET_ARRAY(lm->scratch->words, 0x0, lm->scratch->word_alloc);

    that would have made this hard to get wrong. But that's actually a
    bit of a funny interface, because memset is inherently byte-oriented
    under the hood. So we are not setting each element to 0x0, but
    rather each byte. For a value of 0x0, that is the same thing. But if
    you chose, say "0x1", it is not.

    So it would probably have to be limited to something like:

      CLEAR_ARRAY(lm->scratch->words, lm->scratch->word_alloc);

    which I'd guess would cover most memset cases. But this is getting
    specific enough that maybe the macro is making things more confusing
    rather than less.

  - It's a little gross that we are reaching inside a "struct bitmap" in
    the first place, as it's a mostly opaque type. And the code here has
    to know that the alloc field is sized in eword_t's, not in bytes.

    It feels like there should be a bitmap_clear() function. Its
    implementation would also have to remember to multiply by
    sizeof(eword_t), but at least it would be encapsulated.

    I doubt the leaky abstraction matters that much, though. It seems
    unlikely that we would change it (and if we did, we'd perhaps give
    the field a new name).

    In the same vein, probably using "sizeof(lm->scratch->words)" is
    better than "sizeof(eword_t)". But again, I find it an unlikely
    detail for us to catch under the hood.

-Peff

  reply	other threads:[~2025-11-28 20:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-28 16:37 [PATCH] last-modified: fix bug caused by inproper initialized memory Toon Claes
2025-11-28 20:55 ` Jeff King [this message]
2025-11-28 22:20   ` Anders Kaseorg
2025-11-29 10:50     ` Jeff King
2025-12-08 11:47   ` Toon Claes
2025-12-08 20:15     ` Jeff King
2025-12-08 22:42       ` Junio C Hamano
2025-11-29  2:01 ` Junio C Hamano
2025-11-29  2:11   ` Junio C Hamano
2025-11-29  9:38     ` Toon Claes
2025-12-08 11:46 ` [PATCH v2] last-modified: fix use of uninitialized memory Toon Claes
2025-12-08 13:26   ` Junio C Hamano
2025-12-09  8:43     ` Toon Claes
2025-12-09 12:18       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251128205514.GA605489@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=andersk@mit$(echo .)edu \
    --cc=git@vger$(echo .)kernel.org \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=toon@iotcl$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox