From: "René Scharfe" <l.s.r@web•de>
To: Jeff King <peff@peff•net>
Cc: Git List <git@vger•kernel.org>, Junio C Hamano <gitster@pobox•com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Sat, 16 May 2026 13:10:09 +0200 [thread overview]
Message-ID: <23be5317-4f28-4871-8aab-5281f4da5f0e@web.de> (raw)
In-Reply-To: <20260516025119.GA832077@coredump.intra.peff.net>
On 5/16/26 4:51 AM, Jeff King wrote:
> On Sat, May 16, 2026 at 01:01:05AM +0200, René Scharfe wrote:
>
>>> I think as long as the behavior remains "slow, but we do not overflow
>>> any buffers" when you reach these limits, that's OK. Nobody is going to
>>> do it in practice, and we just want to make sure that malicious inputs
>>> cannot get out-of-bounds writes. It might be worth adding a comment,
>>> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
>>> "alloc" in that macro.
>> There is no overflow check in either version (yet), so neither is safe
>> to operate close to the boundary. Close meaning the intermediate term
>> (alloc + 16) * 3 being bigger than the maximum value.
>
> Yes, but for some definition of safe. Both before and after your patch,
> as we get close to the boundary the allocation will grow slower than it
> should, but we'll never write out of bounds. The behavior for the "git
> foo" I showed earlier is slightly different:
>
> - before your patch, ~2GB we stop doubling and instead start growing
> the array by one at each ALLOC_GROW() call. This is because
> alloc_nr() overflows to a small value, but the:
>
> if (alloc_nr(alloc) < (nr))
> alloc = (nr);
>
> check kicks in.
>
> - after your patch we grow to ~4GB, and then things get super slow.
> This is because we correctly compute the new allocation as a size_t,
> but then truncate it while assigning to alloc. So on the next
> ALLOC_GROW() call, we'll think the buffer is way too small and try
> to realloc again. I don't know why this is so much slower than the
> grow-by-one above, but it is.
>
> Neither is really correct, but both are in the realm of OK: stupidly
> large input doesn't perform well, but there's no buffer overflow
> vulnerability.
>
> What I was worried about is what happens if you tweak your patch like
> this:
>
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 2bc1f43f48..0730dd24ad 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -870,7 +870,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
> size_t alloc_grow_new_alloc_; \
> if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
> alloc = alloc_grow_new_alloc_; \
> - REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
> + REALLOC_ARRAY(x, alloc); \
> } \
> } while (0)
>
>
> In that case we really do end up with too-small allocations and
> out-of-bounds writes.
>
> Maybe you saw that coming and that's why you wrote it as you did. But it
> is definitely subtle enough that I think it would merit a big warning
> comment that "alloc" and "alloc_grow_new_alloc_" are not necessarily the
> same type, and hence not necessarily the same value.
It was the economic thing to do: Fetching the value of user-supplied
alloc variable makes no sense when we have our calculated size_t value
at hand.
>> Here's a demo program exercising the arithmetic part of the macros:
>
> I think the difference isn't in the arithmetic values that come out, but
> in what is fed to realloc() itself. And in your harness, realloc is just
> "x = true". If you actually store the value that would be passed to
> realloc() like this:
>
> diff --git a/foo.c.orig b/foo.c
> index 2fbce8c..7498f36 100644
> --- a/foo.c.orig
> +++ b/foo.c
> @@ -11,7 +11,7 @@
> alloc = (nr); \
> else \
> alloc = alloc_nr(alloc); \
> - x = true; \
> + x = alloc; \
> } \
> } while (0)
>
> @@ -31,7 +31,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
> size_t alloc_grow_new_alloc_; \
> if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
> alloc = alloc_grow_new_alloc_; \
> - x = true; \
> + x = alloc_grow_new_alloc_; \
> } \
> } while (0)
>
> @@ -44,7 +44,7 @@ int main(int argc, char **argv)
> for (T i = 0;; i++) {
> for (T j = MIN;; j++) {
> T alloc1 = j, alloc2 = j;
> - bool allocated1 = false, allocated2 = false;
> + size_t allocated1 = 0, allocated2 = 0;
> ALLOC_GROW1(allocated1, i, alloc1);
> ALLOC_GROW2(allocated2, i, alloc2);
> if (alloc1 != alloc2 || allocated1 != allocated2)
>
> then you see the differences. For negative values, yeah, you end up with
> big size_t values. But for an unsigned type you get different small
> allocations.
Good point. For unsigned char I get differences starting at 156
elements, here just the first few:
unsigned char nr=156 0 (155 -> 0) vs 256 (155 -> 0)
unsigned char nr=157 0 (155 -> 0) vs 256 (155 -> 0)
unsigned char nr=157 2 (156 -> 2) vs 258 (156 -> 2)
So the current code cuts the allocation size to 0 if you have an
array of 155 and ask for more entries. That would cause a buffer
overrun. With the patch ALLOC_GROW actually grows the buffer.
For signed char I see a different failure mode:
signed char nr=71 18446744073709551489 (70 -> -127) vs 129 (70 -> -127)
signed char nr=72 18446744073709551489 (70 -> -127) vs 129 (70 -> -127)
signed char nr=72 18446744073709551490 (71 -> -126) vs 130 (71 -> -126)
The current code tries to allocate (something close to) infinity,
which would terminate the program. With the patch ALLOC_GROW
actually grows the buffer.
I don't see this for int and unsigned, though. Weird C integer
promotion rules hit us here, I guess. Just this, as you mentioned:
ALLOC_GROW1 unsigned nr=1787844770 1787844770 (1787844769 -> 1787844770) step too small, abort
ALLOC_GROW1 int nr=1787844770 1787844770 (1787844769 -> 1787844770) step too small, abort
The demo code is now big and hairy enough to need its own
tests, though. *snicker*
René
#include <limits.h>
#include <stdbool.h>
#include <stdio.h>
#define alloc_nr(x) (((x)+16)*3/2)
#define ALLOC_GROW1(x, nr, alloc) \
do { \
if ((nr) > alloc) { \
if (alloc_nr(alloc) < (nr)) \
alloc = (nr); \
else \
alloc = alloc_nr(alloc); \
x = alloc; \
} \
} while (0)
static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
if (nr > alloc) {
size_t out = alloc_nr(alloc);
*outp = out < nr ? nr : out;
return true;
}
return false;
}
#define ALLOC_GROW2(x, nr, alloc) \
do { \
size_t alloc_grow_new_alloc_; \
if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
alloc = alloc_grow_new_alloc_; \
x = alloc_grow_new_alloc_; \
} \
} while (0)
#define COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2) do { \
T orig_alloc1 = alloc1, orig_alloc2 = alloc2; \
ALLOC_GROW1(alloc_sz1, nr, alloc1); \
ALLOC_GROW2(alloc_sz2, nr, alloc2); \
if (alloc_sz1 != alloc_sz2) \
printf(#T" nr="P" %zu ("P" -> "P") vs %zu ("P" -> "P")\n", \
nr, \
alloc_sz1, orig_alloc1, alloc1, \
alloc_sz2, orig_alloc2, alloc2); \
} while (0)
#define COMPARE_ALL(T, MIN, MAX, P) do { \
for (T nr = 0;; nr++) { \
for (T alloc = MIN;; alloc++) { \
T alloc1 = alloc, alloc2 = alloc; \
size_t alloc_sz1 = alloc1, alloc_sz2 = alloc2; \
COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2); \
if (alloc == MAX) \
break; \
} \
if (nr == MAX) \
break; \
} \
} while (0)
#define COMPARE_GROWTH(T, MAX, P) do { \
T alloc1 = 0, alloc2 = 0; \
size_t alloc_sz1 = 0, alloc_sz2 = 0; \
for (T nr = 0;; nr++) { \
COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2); \
if (nr == MAX) \
break; \
} \
} while (0)
#define CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW) do { \
T alloc = 0; \
size_t alloc_sz = 0; \
for (T nr = 0;; nr++) { \
T orig_alloc = alloc; \
size_t orig_alloc_sz = alloc_sz; \
ALLOC_GROW(alloc_sz, nr, alloc); \
if (alloc_sz < (size_t)nr) \
printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
" too small\n", \
nr, alloc_sz, orig_alloc, alloc); \
if (alloc_sz > alloc_nr((size_t)nr)) \
printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
" too big\n", \
nr, alloc_sz, orig_alloc, alloc); \
if (alloc_sz > orig_alloc_sz && \
alloc_sz - alloc_sz / 3 < orig_alloc_sz) { \
printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
" step too small, abort\n", \
nr, alloc_sz, orig_alloc, alloc); \
break; \
} \
if (nr == MAX) \
break; \
} \
} while (0)
#define CHECK_GROWTH(T, MAX, P) do { \
CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW1); \
CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW2); \
} while (0)
int main(int argc, char **argv)
{
COMPARE_ALL(unsigned char, 0, UCHAR_MAX, "%hhu");
COMPARE_ALL(signed char, 0, SCHAR_MAX, "%hhd");
COMPARE_GROWTH(short, SHRT_MAX, "%hd");
CHECK_GROWTH(short, SHRT_MAX, "%hd");
CHECK_GROWTH(unsigned short, USHRT_MAX, "%hu");
CHECK_GROWTH(unsigned, UINT_MAX, "%u");
CHECK_GROWTH(int, INT_MAX, "%d");
return 0;
}
next prev parent reply other threads:[~2026-05-16 11:10 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50 ` Jeff King
2026-05-15 23:01 ` René Scharfe
2026-05-16 2:51 ` Jeff King
2026-05-16 11:10 ` René Scharfe [this message]
2026-05-16 6:55 ` Johannes Sixt
2026-05-19 0:41 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=23be5317-4f28-4871-8aab-5281f4da5f0e@web.de \
--to=l.s.r@web$(echo .)de \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox