From: Phillip Wood <phillip.wood123@gmail•com>
To: Jeff King <peff@peff•net>, Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org, Patrick Steinhardt <ps@pks•im>,
correctmost <cmlists@sent•com>, Taylor Blau <me@ttaylorr•com>
Subject: Re: [PATCH 2/4] parse: add functions for parsing from non-string buffers
Date: Fri, 5 Dec 2025 16:11:15 +0000 [thread overview]
Message-ID: <4d83375b-76e2-4420-80dd-6a04d3201532@gmail.com> (raw)
In-Reply-To: <20251130131537.GB199335@coredump.intra.peff.net>
On 30/11/2025 13:15, Jeff King wrote:
> If you have a buffer that is not NUL-terminated but want to parse an
> integer, there aren't many good options. If you use strtol() and
> friends, you risk running off the end of the buffer if there is no
> non-digit terminating character. And even if you carefully make sure
> that there is such a character, ASan's strict-string-check mode will
> still complain.
>
> You can copy bytes into a temporary buffer, terminate it, and then call
> strtol(), but doing so adds some pitfalls (like making sure you soak up
> whitespace and leading +/- signs, and reporting overflow for overly long
> input). Or you can hand-parse the digits, but then you need to take some
> care to handle overflow (and again, whitespace and +/- signs).
>
> These things aren't impossible to do right, but it's error-prone to have
> to do them in every spot that wants to do such parsing. So let's add
> some functions which can be used across the code base.
>
> There are a few choices regarding the interface and the implementation.
>
> First, the implementation:
>
> - I went with with parsing the digits (rather than buffering and
> passing to libc functions). It ends up being a similar amount of
> code because we have to do some parsing either way. And likewise
> overflow detection depends on the exact type the caller wants, so we
> either have to do it by hand or write a separate wrapper for
> strtol(), strtoumax(), and so on.
>
> - Unsigned overflow detection is done using the same techniques as in
> unsigned_add_overflows(), etc. We can't use those macros directly
> because our core function is type-agnostic (so the caller passes in
> the max value, rather than us deriving it on the fly). This is
> similar to how git_parse_int(), etc, work.
>
> - Signed overflow detection assumes that we can express a negative
> value with magnitude one larger than our maximum positive value
> (e.g., -128..127 for a signed 8-bit value). I doubt this is
> guaranteed by the standard, but it should hold in practice, and we
> make the same assumption in git_parse_int(), etc. The nice thing
> about this is that we can derive the range from the number of bits
> in the type. For ints, you obviously could use INT_MIN..INT_MAX, but
> for an arbitrary type, we can use maximum_signed_value_of_type().
>
> - I didn't bother with handling bases other than 10. It would
> complicate the code, and I suspect it won't be needed. We could
> probably retro-fit it later without too much work, if need be.
This all sounds sensible to me and an does the interface description.
> +bool parse_unsigned_from_buf(const char *buf, size_t len, const char **ep,
> + uintmax_t *ret, uintmax_t max)
> +{
> + return parse_from_buf_internal(buf, len, ep, NULL, ret, max);
> +}
> +
> +bool parse_signed_from_buf(const char *buf, size_t len, const char **ep,
> + intmax_t *ret, intmax_t max)
> +{
> + uintmax_t u_ret;
> + bool negate;
> +
> + if (!parse_from_buf_internal(buf, len, ep, &negate, &u_ret, max))
> + return false;
> + /*
> + * Range already checked internally, but we must apply negation
> + * ourselves since only we have the signed integer type.
> + */
> + if (negate) {
> + *ret = u_ret;
> + *ret = -*ret;
If we're parsing INTMAX_MIN then this negation tries to calculate
-INTMAX_MIN which is undefined (I've added some tests for parsing
INTMAX_MAX and INTMAX_MIN at [1] and verified that UBSAN is triggered
when parsing INTMAX_MIN). We could do
*ret = u_ret;
if (*ret != INTMAX_MIN)
*ret = -*ret;
but I think it might be easier to alter parse_from_buf_internal() to
make "negate" a local variable, change the function argument to "bool
allow_negative" and do
*ret = negate ? 0u - val : val;
Then parse_signed_from_buf() can do "*ret = *u_ret;" to convert the
output of parse_from_buf_internal() to a signed value.
> diff --git a/t/unit-tests/u-parse-int.c b/t/unit-tests/u-parse-int.c
> new file mode 100644
> index 0000000000..a1601bb16b
> --- /dev/null
> +++ b/t/unit-tests/u-parse-int.c
> @@ -0,0 +1,98 @@
> +#include "unit-test.h"
> +#include "parse.h"
> +
> +static void check_int(const char *buf, size_t len,
> + size_t expect_ep_ofs, int expect_errno,
> + int expect_result)
> +{
> + const char *ep;
> + int result;
Do we want to set errno=0 here so that we can be sure it has been set by
parse_int_from_buf() when we check it below?
Thanks
Phillip
[1]
https://github.com/phillipwood/git/commit/e061e3e640db01d4fcf54d265d33352235151973
> + bool ok = parse_int_from_buf(buf, len, &ep, &result);
> +
> + if (expect_errno) {
> + cl_assert(!ok);
> + cl_assert_equal_i(expect_errno, errno);
> + return;
> + }
> +
> + cl_assert(ok);
> + cl_assert_equal_i(expect_result, result);
> + cl_assert_equal_i(expect_ep_ofs, ep - buf);
> +}
> +
> +static void check_int_str(const char *buf, size_t ofs, int err, int res)
> +{
> + check_int(buf, strlen(buf), ofs, err, res);
> +}
> +
> +static void check_int_full(const char *buf, int res)
> +{
> + check_int_str(buf, strlen(buf), 0, res);
> +}
> +
> +static void check_int_err(const char *buf, int err)
> +{
> + check_int(buf, strlen(buf), 0, err, 0);
> +}
> +
> +void test_parse_int__basic(void)
> +{
> + cl_invoke(check_int_full("0", 0));
> + cl_invoke(check_int_full("11", 11));
> + cl_invoke(check_int_full("-23", -23));
> + cl_invoke(check_int_full("+23", 23));
> +
> + cl_invoke(check_int_str(" 31337 ", 7, 0, 31337));
> +
> + cl_invoke(check_int_err(" garbage", EINVAL));
> + cl_invoke(check_int_err("", EINVAL));
> + cl_invoke(check_int_err("-", EINVAL));
> +
> + cl_invoke(check_int("123", 2, 2, 0, 12));
> +}
> +
> +void test_parse_int__range(void)
> +{
> + /*
> + * These assume a 32-bit int. We could avoid that with some
> + * conditionals, but it's probably better for the test to
> + * fail noisily and we can decide how to handle it then.
> + */
> + cl_invoke(check_int_full("2147483647", 2147483647));
> + cl_invoke(check_int_err("2147483648", ERANGE));
> + cl_invoke(check_int_full("-2147483647", -2147483647));
> + cl_invoke(check_int_full("-2147483648", -2147483648));
> + cl_invoke(check_int_err("-2147483649", ERANGE));
> +}
> +
> +static void check_unsigned(const char *buf, uintmax_t max,
> + int expect_errno, uintmax_t expect_result)
> +{
> + const char *ep;
> + uintmax_t result;
> + bool ok = parse_unsigned_from_buf(buf, strlen(buf), &ep, &result, max);
> +
> + if (expect_errno) {
> + cl_assert(!ok);
> + cl_assert_equal_i(expect_errno, errno);
> + return;
> + }
> +
> + cl_assert(ok);
> + cl_assert_equal_s(ep, "");
> + /*
> + * Do not use cl_assert_equal_i_fmt(..., PRIuMAX) here. The macro
> + * casts to int under the hood, corrupting the values.
> + */
> + clar__assert_equal(CLAR_CURRENT_FILE, CLAR_CURRENT_FUNC,
> + CLAR_CURRENT_LINE,
> + "expect_result != result", 1,
> + "%"PRIuMAX, expect_result, result);
> +}
> +
> +void test_parse_int__unsigned(void)
> +{
> + cl_invoke(check_unsigned("4294967295", UINT_MAX, 0, 4294967295U));
> + cl_invoke(check_unsigned("1053", 1000, ERANGE, 0));
> + cl_invoke(check_unsigned("-17", UINT_MAX, EINVAL, 0));
> +}
next prev parent reply other threads:[~2025-12-05 16:11 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12 7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12 8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25 ` Patrick Steinhardt
2025-11-13 2:55 ` Taylor Blau
2025-11-18 8:59 ` Jeff King
2025-11-12 8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12 8:17 ` Collin Funk
2025-11-12 10:31 ` Jeff King
2025-11-12 20:06 ` Collin Funk
2025-11-12 11:26 ` Patrick Steinhardt
2025-11-13 3:12 ` Taylor Blau
2025-11-13 6:34 ` Patrick Steinhardt
2025-11-18 8:49 ` Jeff King
2025-11-13 16:30 ` Junio C Hamano
2025-11-14 7:00 ` Patrick Steinhardt
2025-11-15 2:13 ` Jeff King
2025-11-12 8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26 ` Patrick Steinhardt
2025-11-13 3:09 ` Taylor Blau
2025-11-18 8:40 ` Jeff King
2025-11-18 8:38 ` Jeff King
2025-11-12 8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12 8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12 8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12 8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25 ` Patrick Steinhardt
2025-11-12 19:36 ` Junio C Hamano
2025-11-15 2:12 ` Jeff King
2025-11-12 8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13 3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18 9:11 ` [PATCH v2 " Jeff King
2025-11-18 9:11 ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18 9:12 ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18 9:12 ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18 9:12 ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30 ` Phillip Wood
2025-11-23 6:19 ` Junio C Hamano
2025-11-23 15:51 ` Phillip Wood
2025-11-23 18:06 ` Junio C Hamano
2025-11-24 22:30 ` Jeff King
2025-11-24 23:09 ` Junio C Hamano
2025-11-26 15:09 ` Jeff King
2025-11-26 17:22 ` Junio C Hamano
2025-11-30 13:13 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14 ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23 ` Patrick Steinhardt
2025-11-30 13:15 ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46 ` my complaints with clar Jeff King
2025-12-01 14:16 ` Phillip Wood
2025-12-04 11:09 ` Patrick Steinhardt
2025-12-05 18:30 ` Jeff King
2025-12-04 11:23 ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11 ` Phillip Wood [this message]
2026-01-20 20:54 ` Junio C Hamano
2026-01-21 5:27 ` Jeff King
2025-11-30 13:15 ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16 ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18 9:12 ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18 9:12 ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18 9:12 ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18 9:12 ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18 9:12 ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23 5:49 ` [PATCH v2 0/9] asan bonanza Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4d83375b-76e2-4420-80dd-6a04d3201532@gmail.com \
--to=phillip.wood123@gmail$(echo .)com \
--cc=cmlists@sent$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=me@ttaylorr$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=phillip.wood@dunelm$(echo .)org.uk \
--cc=ps@pks$(echo .)im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox