public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Ben Knoble <ben.knoble@gmail•com>
Cc: git@vger•kernel.org, Ben Stav <benstav@miggo•io>
Subject: Re: [PATCH] attr: avoid recursion when expanding attribute macros
Date: Wed, 12 Nov 2025 02:09:07 -0500	[thread overview]
Message-ID: <20251112070907.GA431661@coredump.intra.peff.net> (raw)
In-Reply-To: <F6B66286-64B0-47AB-A31D-50A253F001D5@gmail.com>

On Tue, Nov 11, 2025 at 08:30:58PM -0500, Ben Knoble wrote:

> My knowledge on memory models is a bit weak and I didn’t check
> directly, but are we implicitly assuming that we are less likely to
> run out of heap memory in such an evil case? In effect I suppose we’re
> turning a stack overflow segfault into an OOM error?

Yes, I think you could think of it that way. But there are two reasons
to prefer heap:

  1. The heap limits are _way_ bigger. The stack size on Linux is
     usually 8MB, and that is considered large. It's much smaller on
     other platforms (and especially if you have multiple threads).

  2. In C, you don't have many options for detecting the case of running
     out of stack, let alone recovering from it. Whereas you can check
     for heap allocation failures. We don't tend to do anything besides
     die() in git, but it's still nicer to have a controlled die than a
     segfault.

So switching out stack recursion to spending heap memory essentially
makes the problem go away, or at least turns it into one of the zillion
other ways that you can convince Git to allocate a bunch of heap memory. ;)

> The memory use has to go somewhere ;) presuming there’s no good way to
> only keep the relevant entries in memory, since I can of course find a
> large example that also uses each intermediate macro, so the code
> would need to get a lot smarter to collapse equivalence classes, prune
> unused paths, etc., which seems like a poor investment for what AFAICT
> is a little-used feature*.

We have a hard limit of 100MB on attributes files, which is mostly a
made-up number (it was the size that GitHub had been limiting for all
blobs for years, so we knew nobody would complain about instituting it).
From the research in 3c50032ff5 (attr: ignore overly large gitattributes
files, 2022-12-01), it would probably be fine to drop it by a factor of
10 or more.

Though I think you might be able to chain macros across files (so
".gitattributes" introduces macro "foo", and the "sub/.gitattributes"
introduces "bar" which resolves to "foo", and so on). In which case your
total size is larger, and only eventually limited by how deep a tree
we'll accept (another place where we recurse, but there is a
configurable depth limit).

So for the most part Git's protection against these sort of resource
consumption attacks is: die if the process wants too many resources, and
people who try to tickle those limits are only hurting their own repos.
It does put people who host arbitrary Git repos on the hook for managing
resources at the OS level (so greedy and malicious processes are killed
rather than bringing down the rest of the system).

-Peff

  reply	other threads:[~2025-11-12  7:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-11 22:36 [PATCH] attr: avoid recursion when expanding attribute macros Jeff King
2025-11-12  1:30 ` Ben Knoble
2025-11-12  7:09   ` Jeff King [this message]
2025-11-12  7:17     ` Jeff King
2025-11-12  6:57 ` Patrick Steinhardt
2025-11-12  7:16   ` Jeff King
2025-11-12 10:21     ` Patrick Steinhardt
2025-11-12 17:40   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251112070907.GA431661@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=ben.knoble@gmail$(echo .)com \
    --cc=benstav@miggo$(echo .)io \
    --cc=git@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox