public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Junio C Hamano <gitster@pobox•com>
Cc: Phillip Wood <phillip.wood123@gmail•com>,
	git@vger•kernel.org, Patrick Steinhardt <ps@pks•im>,
	correctmost <cmlists@sent•com>, Taylor Blau <me@ttaylorr•com>
Subject: [PATCH 0/4] more robust functions for parsing int from buf
Date: Sun, 30 Nov 2025 08:13:51 -0500	[thread overview]
Message-ID: <20251130131351.GA198697@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqqldjsogip.fsf@gitster.g>

On Wed, Nov 26, 2025 at 09:22:38AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff•net> writes:
> 
> > Hmm, I thought both of those things were reasonably clever. The other
> > obvious way to do it, AFAICT, is to used checked-operation intrinsics or
> > add unsigned_add_overflows() before every operation.
> 
> Yup, but the thing is, I didn't want something "clever".  I prefer
> "clean and obvious" if we add extra code for safety.

Yeah, that's fair. It turns out that one half of that is easy: checking
for overflow as we compute the number). And one half is hard. If you
don't assume a twos-complement style range where the "min = -max - 1",
then you are stuck using INT_MIN. Which is OK for "int", but not for
arbitrary types. We already make the same assumption in git_parse_int(),
etc.

So I went with that approach here, but it is at least documented
clearly.

> > It looks like you merged what I had into 'next'. Where do you want to go
> > from there? I am mostly content to let it be, but we can also try to
> > replace with something like your version.
> 
> That is my preference.  While the topic is still in 'next', or after
> the topic graduates to 'master'.  Either is fine.  And it is fine if
> such an update did not come, too.  After all, this is to deal with
> contents in a locally generated file (.git/index), so a maliciously
> corrupt string that lack the expected whitespace character after the
> digit string is a sign that you are trying to burn yourself and you
> have only yourself to blame, isn't it?  An attacker that can put
> garbage in your .git/index has better ways to fool you by updating
> your .git/config file that sits next to it.  Or teach the sanitizer
> that this code path is already OK somehow?

Yeah, I agree the stakes are low here. Though they were somewhat low to
begin with for the same reason! But I was grossed out enough by the
whole thing that I tried to put together a decent helper for parsing
integers from buffers, and converted both sites here.

I suspect it could be used in other places, too, but I didn't convert
any.

> > Or even, I guess, work on a
> > global strntoi() that could be used everywhere, if we think it is robust
> > enough. (Though technically that name is reserved by the standard, which
> > is a shame, because that is really what this thing is).
> 
> Well, we already use plenty of names beginning with 'str' followed
> by a lowercase letter, like strbuf_foo() and string_list_init().

In the end it was sufficiently different from strtoi() that I decided
not to use that name. It was but one of many bike-sheddable decisions,
which I tried to document. So I guess let the flaming commence. ;)

This is built on top of jk/asan-bonanza.

  [1/4]: parse: prefer bool to int for boolean returns
  [2/4]: parse: add functions for parsing from non-string buffers
  [3/4]: cache-tree: use parse_int_from_buf()
  [4/4]: fsck: use parse_unsigned_from_buf() for parsing timestamp

 Makefile                   |   1 +
 cache-tree.c               |  28 ++-----
 compat/posix.h             |   2 +
 fsck.c                     |  20 +----
 parse.c                    | 162 +++++++++++++++++++++++++++++--------
 parse.h                    |  31 +++++--
 t/meson.build              |   1 +
 t/unit-tests/u-parse-int.c |  98 ++++++++++++++++++++++
 8 files changed, 263 insertions(+), 80 deletions(-)
 create mode 100644 t/unit-tests/u-parse-int.c

-Peff

  reply	other threads:[~2025-11-30 13:14 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12  7:55 [PATCH 0/9] asan bonanza Jeff King
2025-11-12  7:56 ` [PATCH 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-12  8:01 ` [PATCH 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-13  2:55   ` Taylor Blau
2025-11-18  8:59     ` Jeff King
2025-11-12  8:02 ` [PATCH 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-12  8:17   ` Collin Funk
2025-11-12 10:31     ` Jeff King
2025-11-12 20:06       ` Collin Funk
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:12     ` Taylor Blau
2025-11-13  6:34       ` Patrick Steinhardt
2025-11-18  8:49       ` Jeff King
2025-11-13 16:30     ` Junio C Hamano
2025-11-14  7:00       ` Patrick Steinhardt
2025-11-15  2:13         ` Jeff King
2025-11-12  8:05 ` [PATCH 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-12 11:26   ` Patrick Steinhardt
2025-11-13  3:09     ` Taylor Blau
2025-11-18  8:40       ` Jeff King
2025-11-18  8:38     ` Jeff King
2025-11-12  8:06 ` [PATCH 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-12  8:06 ` [PATCH 6/9] fsck: avoid strcspn() " Jeff King
2025-11-12  8:06 ` [PATCH 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-12  8:10 ` [PATCH 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-12 11:25   ` Patrick Steinhardt
2025-11-12 19:36     ` Junio C Hamano
2025-11-15  2:12     ` Jeff King
2025-11-12  8:10 ` [PATCH 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-13  3:17 ` [PATCH 0/9] asan bonanza Taylor Blau
2025-11-18  9:11 ` [PATCH v2 " Jeff King
2025-11-18  9:11   ` [PATCH v2 1/9] compat/mmap: mark unused argument in git_munmap() Jeff King
2025-11-18  9:12   ` [PATCH v2 2/9] pack-bitmap: handle name-hash lookups in incremental bitmaps Jeff King
2025-11-18  9:12   ` [PATCH v2 3/9] Makefile: turn on NO_MMAP when building with ASan Jeff King
2025-11-18  9:12   ` [PATCH v2 4/9] cache-tree: avoid strtol() on non-string buffer Jeff King
2025-11-18 14:30     ` Phillip Wood
2025-11-23  6:19       ` Junio C Hamano
2025-11-23 15:51         ` Phillip Wood
2025-11-23 18:06           ` Junio C Hamano
2025-11-24 22:30         ` Jeff King
2025-11-24 23:09           ` Junio C Hamano
2025-11-26 15:09             ` Jeff King
2025-11-26 17:22               ` Junio C Hamano
2025-11-30 13:13                 ` Jeff King [this message]
2025-11-30 13:14                   ` [PATCH 1/4] parse: prefer bool to int for boolean returns Jeff King
2025-12-04 11:23                     ` Patrick Steinhardt
2025-11-30 13:15                   ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Jeff King
2025-11-30 13:46                     ` my complaints with clar Jeff King
2025-12-01 14:16                       ` Phillip Wood
2025-12-04 11:09                         ` Patrick Steinhardt
2025-12-05 18:30                           ` Jeff King
2025-12-04 11:23                     ` [PATCH 2/4] parse: add functions for parsing from non-string buffers Patrick Steinhardt
2025-12-05 16:11                     ` Phillip Wood
2026-01-20 20:54                       ` Junio C Hamano
2026-01-21  5:27                         ` Jeff King
2025-11-30 13:15                   ` [PATCH 3/4] cache-tree: use parse_int_from_buf() Jeff King
2025-11-30 13:16                   ` [PATCH 4/4] fsck: use parse_unsigned_from_buf() for parsing timestamp Jeff King
2025-11-18  9:12   ` [PATCH v2 5/9] fsck: assert newline presence in fsck_ident() Jeff King
2025-11-18  9:12   ` [PATCH v2 6/9] fsck: avoid strcspn() " Jeff King
2025-11-18  9:12   ` [PATCH v2 7/9] fsck: remove redundant date timestamp check Jeff King
2025-11-18  9:12   ` [PATCH v2 8/9] fsck: avoid parse_timestamp() on buffer that isn't NUL-terminated Jeff King
2025-11-18  9:12   ` [PATCH v2 9/9] t: enable ASan's strict_string_checks option Jeff King
2025-11-23  5:49   ` [PATCH v2 0/9] asan bonanza Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251130131351.GA198697@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=cmlists@sent$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=me@ttaylorr$(echo .)com \
    --cc=phillip.wood123@gmail$(echo .)com \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox