public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks•im>
To: Jeff King <peff@peff•net>
Cc: Junio C Hamano <gitster@pobox•com>,
	Phillip Wood <phillip.wood123@gmail•com>,
	git@vger•kernel.org, correctmost <cmlists@sent•com>,
	Taylor Blau <me@ttaylorr•com>
Subject: Re: [PATCH 2/4] parse: add functions for parsing from non-string buffers
Date: Thu, 4 Dec 2025 12:23:20 +0100	[thread overview]
Message-ID: <aTFvKOHlm4zfT9dU@pks.im> (raw)
In-Reply-To: <20251130131537.GB199335@coredump.intra.peff.net>

On Sun, Nov 30, 2025 at 08:15:37AM -0500, Jeff King wrote:
[snip]
> For the interface:
> 
>   - What do we call it? We have git_parse_int() and friends, which aim
>     to make parsing less error-prone. And in some ways, these are just
>     buffer (rather than string) versions of those functions. But not
>     entirely. Those functions are aimed at parsing a single user-facing
>     value. So they accept a unit prefix (e.g., "10k"), which we won't
>     always want. And they insist that the whole string is consumed
>     (rather than passing back an "end" pointer).
> 
>     We also have strtol_i() and strtoul_ui() wrappers, which try to make
>     error handling simpler (especially around overflow), but mostly
>     behave like their libc counterparts. These also don't pass out an
>     end pointer, though.
> 
>     So I started a new namespace, "parse_<type>_from_buf".

I think it would be nice if we could eventually converge towards a
common namespace here. E.g. `strotol_i()` would then become
`parse_<type>()`, without the `_from_buf()` suffix. That would make it a
bit more discoverable.

Similarly, `git_parse_int()` could become `parse_<type>_with_units()`
eventually.

That certainly doesn't have to be part of this series though.

>   - Like those other functions above, we use an out-parameter to store
>     the result, which lets us return an error code directly. This avoids
>     the complicated errno dance for detecting overflow that you get with
>     strtol().
> 
>     What should the error code look like? git_parse_int() uses a bool
>     for success/failure. But strtol_ui() uses the syscall-like "0 is
>     success, -1 is error" convention.
> 
>     I went with the bool approach here. Since the names are closest to
>     those functions, I thought it would cause the least confusion.

I think that's a sensible choice.

> diff --git a/Makefile b/Makefile
> index 237b56fc9d..751bd40a9f 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1510,6 +1510,7 @@ CLAR_TEST_SUITES += u-mem-pool
>  CLAR_TEST_SUITES += u-oid-array
>  CLAR_TEST_SUITES += u-oidmap
>  CLAR_TEST_SUITES += u-oidtree
> +CLAR_TEST_SUITES += u-parse-int
>  CLAR_TEST_SUITES += u-prio-queue
>  CLAR_TEST_SUITES += u-reftable-basics
>  CLAR_TEST_SUITES += u-reftable-block
> diff --git a/parse.c b/parse.c
> index f626846def..1dcbcf64a1 100644
> --- a/parse.c
> +++ b/parse.c
> @@ -209,3 +209,99 @@ unsigned long git_env_ulong(const char *k, unsigned long val)
>  		die(_("failed to parse %s"), k);
>  	return val;
>  }
> +
> +/*
> + * Helper that handles both signed/unsigned cases. If "negate" is NULL,
> + * negative values are disallowed. If not NULL and the input is negative,
> + * the value is range-checked but the caller is responsible for actually doing
> + * the negatiion. You probably don't want to use this! Use one of
> + * parse_signed_from_buf() or parse_unsigned_from_buf() below.
> + */
> +static bool parse_from_buf_internal(const char *buf, size_t len,
> +				    const char **ep, bool *negate,
> +				    uintmax_t *ret, uintmax_t max)
> +{
> +	const char *end = buf + len;
> +	uintmax_t val = 0;
> +
> +	while (buf < end && isspace(*buf))
> +		buf++;

Hm. Do we really want to retain the behaviour of skipping leading
spaces? I think it's a rather weird edge case of `strtol()` and friends,
and if we can avoid it I'd prefer to not replicate this behaviour.

> diff --git a/t/unit-tests/u-parse-int.c b/t/unit-tests/u-parse-int.c
> new file mode 100644
> index 0000000000..a1601bb16b
> --- /dev/null
> +++ b/t/unit-tests/u-parse-int.c
> @@ -0,0 +1,98 @@
[snip]
> +void test_parse_int__basic(void)
> +{
> +	cl_invoke(check_int_full("0", 0));
> +	cl_invoke(check_int_full("11", 11));
> +	cl_invoke(check_int_full("-23", -23));
> +	cl_invoke(check_int_full("+23", 23));
> +
> +	cl_invoke(check_int_str("  31337  ", 7, 0, 31337));
> +
> +	cl_invoke(check_int_err("  garbage", EINVAL));
> +	cl_invoke(check_int_err("", EINVAL));
> +	cl_invoke(check_int_err("-", EINVAL));
> +
> +	cl_invoke(check_int("123", 2, 2, 0, 12));
> +}

As Phillip suggested, it might make sense to wrap these `cl_invoke()`
calls into a macro.

Patrick

  parent reply	other threads:[~2025-12-04 11:23 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12  7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12  8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-13  2:55   ` Taylor Blau
2025-11-18  8:59     ` Jeff King
2025-11-12  8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12  8:17   ` Collin Funk
2025-11-12 10:31     ` Jeff King
2025-11-12 20:06       ` Collin Funk
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:12     ` Taylor Blau
2025-11-13  6:34       ` Patrick Steinhardt
2025-11-18  8:49       ` Jeff King
2025-11-13 16:30     ` Junio C Hamano
2025-11-14  7:00       ` Patrick Steinhardt
2025-11-15  2:13         ` Jeff King
2025-11-12  8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:09     ` Taylor Blau
2025-11-18  8:40       ` Jeff King
2025-11-18  8:38     ` Jeff King
2025-11-12  8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12  8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12  8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12  8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-12 19:36     ` Junio C Hamano
2025-11-15  2:12     ` Jeff King
2025-11-12  8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13  3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18  9:11 ` [PATCH v2 " Jeff King
2025-11-18  9:11   ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18  9:12   ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18  9:12   ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18  9:12   ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30     ` Phillip Wood
2025-11-23  6:19       ` Junio C Hamano
2025-11-23 15:51         ` Phillip Wood
2025-11-23 18:06           ` Junio C Hamano
2025-11-24 22:30         ` Jeff King
2025-11-24 23:09           ` Junio C Hamano
2025-11-26 15:09             ` Jeff King
2025-11-26 17:22               ` Junio C Hamano
2025-11-30 13:13                 ` [PATCH 0/4] more robust functions for parsing int from buf Jeff King
2025-11-30 13:14                   ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23                     ` Patrick Steinhardt
2025-11-30 13:15                   ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46                     ` my complaints with clar Jeff King
2025-12-01 14:16                       ` Phillip Wood
2025-12-04 11:09                         ` Patrick Steinhardt
2025-12-05 18:30                           ` Jeff King
2025-12-04 11:23                     ` Patrick Steinhardt [this message]
2025-12-05 16:11                     ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Phillip Wood
2026-01-20 20:54                       ` Junio C Hamano
2026-01-21  5:27                         ` Jeff King
2025-11-30 13:15                   ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16                   ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18  9:12   ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18  9:12   ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18  9:12   ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18  9:12   ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18  9:12   ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23  5:49   ` [PATCH v2 0/9] asan bonanza Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTFvKOHlm4zfT9dU@pks.im \
    --to=ps@pks$(echo .)im \
    --cc=cmlists@sent$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=me@ttaylorr$(echo .)com \
    --cc=peff@peff$(echo .)net \
    --cc=phillip.wood123@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox