Re: [PATCH] evaluate the second argument of ALLOC_GROW only once

public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed

From: "René Scharfe" <l.s.r@web•de>
To: Jeff King <peff@peff•net>
Cc: Git List <git@vger•kernel.org>, Junio C Hamano <gitster@pobox•com>
Subject: Re: [PATCH] evaluate the second argument of ALLOC_GROW only once
Date: Sat, 16 May 2026 13:10:09 +0200	[thread overview]
Message-ID: <23be5317-4f28-4871-8aab-5281f4da5f0e@web.de> (raw)
In-Reply-To: <20260516025119.GA832077@coredump.intra.peff.net>

On 5/16/26 4:51 AM, Jeff King wrote:
> On Sat, May 16, 2026 at 01:01:05AM +0200, René Scharfe wrote:
> 
>>> I think as long as the behavior remains "slow, but we do not overflow
>>> any buffers" when you reach these limits, that's OK. Nobody is going to
>>> do it in practice, and we just want to make sure that malicious inputs
>>> cannot get out-of-bounds writes. It might be worth adding a comment,
>>> though, to make sure nobody ever swaps "alloc_grow_new_alloc_" for
>>> "alloc" in that macro.
>> There is no overflow check in either version (yet), so neither is safe
>> to operate close to the boundary.  Close meaning the intermediate term
>> (alloc + 16) * 3 being bigger than the maximum value.
> 
> Yes, but for some definition of safe. Both before and after your patch,
> as we get close to the boundary the allocation will grow slower than it
> should, but we'll never write out of bounds. The behavior for the "git
> foo" I showed earlier is slightly different:
> 
>   - before your patch, ~2GB we stop doubling and instead start growing
>     the array by one at each ALLOC_GROW() call. This is because
>     alloc_nr() overflows to a small value, but the:
> 
>       if (alloc_nr(alloc) < (nr))
>               alloc = (nr);
> 
>     check kicks in.
> 
>   - after your patch we grow to ~4GB, and then things get super slow.
>     This is because we correctly compute the new allocation as a size_t,
>     but then truncate it while assigning to alloc. So on the next
>     ALLOC_GROW() call, we'll think the buffer is way too small and try
>     to realloc again. I don't know why this is so much slower than the
>     grow-by-one above, but it is.
> 
> Neither is really correct, but both are in the realm of OK: stupidly
> large input doesn't perform well, but there's no buffer overflow
> vulnerability.
> 
> What I was worried about is what happens if you tweak your patch like
> this:
> 
> diff --git a/git-compat-util.h b/git-compat-util.h
> index 2bc1f43f48..0730dd24ad 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -870,7 +870,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
>  		size_t alloc_grow_new_alloc_; \
>  		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>  			alloc = alloc_grow_new_alloc_; \
> -			REALLOC_ARRAY(x, alloc_grow_new_alloc_); \
> +			REALLOC_ARRAY(x, alloc); \
>  		} \
>  	} while (0)
>  
> 
> In that case we really do end up with too-small allocations and
> out-of-bounds writes.
> 
> Maybe you saw that coming and that's why you wrote it as you did. But it
> is definitely subtle enough that I think it would merit a big warning
> comment that "alloc" and "alloc_grow_new_alloc_" are not necessarily the
> same type, and hence not necessarily the same value.

It was the economic thing to do: Fetching the value of user-supplied
alloc variable makes no sense when we have our calculated size_t value
at hand.

>> Here's a demo program exercising the arithmetic part of the macros:
> 
> I think the difference isn't in the arithmetic values that come out, but
> in what is fed to realloc() itself. And in your harness, realloc is just
> "x = true". If you actually store the value that would be passed to
> realloc() like this:
> 
> diff --git a/foo.c.orig b/foo.c
> index 2fbce8c..7498f36 100644
> --- a/foo.c.orig
> +++ b/foo.c
> @@ -11,7 +11,7 @@
>  				alloc = (nr); \
>  			else \
>  				alloc = alloc_nr(alloc); \
> -			x = true; \
> +			x = alloc; \
>  		} \
>  	} while (0)
>  
> @@ -31,7 +31,7 @@ static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
>  		size_t alloc_grow_new_alloc_; \
>  		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
>  			alloc = alloc_grow_new_alloc_; \
> -			x = true; \
> +			x = alloc_grow_new_alloc_; \
>  		} \
>  	} while (0)
>  
> @@ -44,7 +44,7 @@ int main(int argc, char **argv)
>  	for (T i = 0;; i++) {
>  		for (T j = MIN;; j++) {
>  			T alloc1 = j, alloc2 = j;
> -			bool allocated1 = false, allocated2 = false;
> +			size_t allocated1 = 0, allocated2 = 0;
>  			ALLOC_GROW1(allocated1, i, alloc1);
>  			ALLOC_GROW2(allocated2, i, alloc2);
>  			if (alloc1 != alloc2 || allocated1 != allocated2)
> 
> then you see the differences. For negative values, yeah, you end up with
> big size_t values. But for an unsigned type you get different small
> allocations.
Good point.  For unsigned char I get differences starting at 156
elements, here just the first few:

unsigned char nr=156 0 (155 -> 0) vs 256 (155 -> 0)
unsigned char nr=157 0 (155 -> 0) vs 256 (155 -> 0)
unsigned char nr=157 2 (156 -> 2) vs 258 (156 -> 2)

So the current code cuts the allocation size to 0 if you have an
array of 155 and ask for more entries.  That would cause a buffer
overrun.  With the patch ALLOC_GROW actually grows the buffer.

For signed char I see a different failure mode:

signed char nr=71 18446744073709551489 (70 -> -127) vs 129 (70 -> -127)
signed char nr=72 18446744073709551489 (70 -> -127) vs 129 (70 -> -127)
signed char nr=72 18446744073709551490 (71 -> -126) vs 130 (71 -> -126)

The current code tries to allocate (something close to) infinity,
which would terminate the program.  With the patch ALLOC_GROW
actually grows the buffer.

I don't see this for int and unsigned, though.  Weird C integer
promotion rules hit us here, I guess.  Just this, as you mentioned:

ALLOC_GROW1 unsigned nr=1787844770 1787844770 (1787844769 -> 1787844770) step too small, abort
ALLOC_GROW1 int nr=1787844770 1787844770 (1787844769 -> 1787844770) step too small, abort

The demo code is now big and hairy enough to need its own
tests, though.  *snicker*

René


#include <limits.h>
#include <stdbool.h>
#include <stdio.h>

#define alloc_nr(x) (((x)+16)*3/2)

#define ALLOC_GROW1(x, nr, alloc) \
	do { \
		if ((nr) > alloc) { \
			if (alloc_nr(alloc) < (nr)) \
				alloc = (nr); \
			else \
				alloc = alloc_nr(alloc); \
			x = alloc; \
		} \
	} while (0)

static inline bool st_alloc_nr(size_t nr, size_t alloc, size_t *outp)
{
	if (nr > alloc) {
		size_t out = alloc_nr(alloc);
		*outp = out < nr ? nr : out;
		return true;
	}
	return false;
}


#define ALLOC_GROW2(x, nr, alloc) \
	do { \
		size_t alloc_grow_new_alloc_; \
		if (st_alloc_nr((nr), (alloc), &alloc_grow_new_alloc_)) { \
			alloc = alloc_grow_new_alloc_; \
			x = alloc_grow_new_alloc_; \
		} \
	} while (0)

#define COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2) do { \
	T orig_alloc1 = alloc1, orig_alloc2 = alloc2; \
	ALLOC_GROW1(alloc_sz1, nr, alloc1); \
	ALLOC_GROW2(alloc_sz2, nr, alloc2); \
	if (alloc_sz1 != alloc_sz2) \
		printf(#T" nr="P" %zu ("P" -> "P") vs %zu ("P" -> "P")\n", \
		       nr,  \
		       alloc_sz1, orig_alloc1, alloc1, \
		       alloc_sz2, orig_alloc2, alloc2); \
} while (0)

#define COMPARE_ALL(T, MIN, MAX, P) do { \
	for (T nr = 0;; nr++) { \
		for (T alloc = MIN;; alloc++) { \
			T alloc1 = alloc, alloc2 = alloc; \
			size_t alloc_sz1 = alloc1, alloc_sz2 = alloc2; \
			COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2); \
			if (alloc == MAX) \
				break; \
		} \
		if (nr == MAX) \
			break; \
	} \
} while (0)

#define COMPARE_GROWTH(T, MAX, P) do { \
	T alloc1 = 0, alloc2 = 0; \
	size_t alloc_sz1 = 0, alloc_sz2 = 0; \
	for (T nr = 0;; nr++) { \
		COMPARE(T, P, nr, alloc1, alloc_sz1, alloc2, alloc_sz2); \
		if (nr == MAX) \
			break; \
	} \
} while (0)

#define CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW) do { \
	T alloc = 0; \
	size_t alloc_sz = 0; \
	for (T nr = 0;; nr++) { \
		T orig_alloc = alloc; \
		size_t orig_alloc_sz = alloc_sz; \
		ALLOC_GROW(alloc_sz, nr, alloc); \
		if (alloc_sz < (size_t)nr) \
			printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
			       " too small\n", \
			       nr, alloc_sz, orig_alloc, alloc); \
		if (alloc_sz > alloc_nr((size_t)nr)) \
			printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
			       " too big\n", \
			       nr, alloc_sz, orig_alloc, alloc); \
		if (alloc_sz > orig_alloc_sz && \
		    alloc_sz - alloc_sz / 3 < orig_alloc_sz) { \
			printf(#ALLOC_GROW" "#T" nr="P" %zu ("P" -> "P")" \
			       " step too small, abort\n", \
			       nr, alloc_sz, orig_alloc, alloc); \
			break; \
		} \
		if (nr == MAX) \
			break; \
	} \
} while (0)

#define CHECK_GROWTH(T, MAX, P) do { \
	CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW1); \
	CHECK_GROWTH_ONE(T, MAX, P, ALLOC_GROW2); \
} while (0)

int main(int argc, char **argv)
{
	COMPARE_ALL(unsigned char, 0, UCHAR_MAX, "%hhu");
	COMPARE_ALL(signed char, 0, SCHAR_MAX, "%hhd");
	COMPARE_GROWTH(short, SHRT_MAX, "%hd");
	CHECK_GROWTH(short, SHRT_MAX, "%hd");
	CHECK_GROWTH(unsigned short, USHRT_MAX, "%hu");
	CHECK_GROWTH(unsigned, UINT_MAX, "%u");
	CHECK_GROWTH(int, INT_MAX, "%d");
	return 0;
}

next prev parent reply	other threads:[~2026-05-16 11:10 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 18:16 [PATCH] evaluate the second argument of ALLOC_GROW only once René Scharfe
2026-05-15 19:08 ` Jeff King
2026-05-15 19:50   ` Jeff King
2026-05-15 23:01     ` René Scharfe
2026-05-16  2:51       ` Jeff King
2026-05-16 11:10         ` René Scharfe [this message]
2026-05-16  6:55     ` Johannes Sixt
2026-05-19  0:41       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23be5317-4f28-4871-8aab-5281f4da5f0e@web.de \
    --to=l.s.r@web$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox