public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: "René Scharfe" <l.s.r@web•de>
Cc: Git List <git@vger•kernel.org>
Subject: Re: [PATCH] describe: use khash in finish_depth_computation()
Date: Tue, 2 Sep 2025 08:38:34 -0400	[thread overview]
Message-ID: <20250902123834.GA711442@coredump.intra.peff.net> (raw)
In-Reply-To: <cb192b28-d85a-4866-a312-df4408cae93e@web.de>

On Sun, Aug 31, 2025 at 07:25:13PM +0200, René Scharfe wrote:

> > diff --git a/builtin/describe.c b/builtin/describe.c
> > index edb4dec79d..e024feb080 100644
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -289,7 +289,7 @@ static void lazy_queue_clear(struct lazy_queue *queue)
> >  
> >  static inline unsigned int commit_index(const struct commit *commit)
> >  {
> > -	return commit->index;
> > +	return oidhash(&commit->object.oid);
> >  }
> >  
> >  static inline int ptr_eq(const void *a, const void *b)
> > 
> > I get similar results (actually faster, but I think within the noise).
> 
> Sure.  I'm not comfortable with oidhash() though, because it allows
> attackers to influence the hash value, cause collisions and thus
> increase the cost of lookups and inserts to O(N), leading to quadratic
> complexity overall.
>
> They "just" need to construct commits with a common hash prefix.  I
> guess that's easy for two bytes and hard for four bytes.  Not sure how
> what an attacker would get out of planting such performance traps, but
> I guess some people would do it just for the heck of it.

I doubt that commit->index is any better in that regard. If I can
influence the order in which Git loads the commits (e.g., by creating a
bunch of refs which get loaded when we walk over for_each_ref), I can
choose the index for each. They'll be unique, but I can still cause
collisions modulo the number of buckets.

Likewise for oidhash(), I'd guess that colliding 4 bytes is not even
necessary to cause trouble, since probing starts by throwing away
everything mod n_buckets. So really you just need to collide however
many low bits you need to make your desired N, and then get O(N^2)
behavior.

I'm not super worried about it, because in my experience Git is already
a perfectly tuned device for convincing other people to waste CPU. ;)
But if we want to address it, I'd rather do so by improving oidhash()
than avoiding it. Specifically...

> Letting oidhash() XOR in another word would close that line of attack
> for quite a while, I assume.

Yeah. We have at least 160 bits in an object hash, and we only bother
with the low 32. We could XOR up to 5 times, but I agree that even a
single extra word would probably be plenty. Might be an interesting
experiment to time something like the patch below on a hash-heavy
workload.

diff --git a/hash.h b/hash.h
index fae966b23c..c9d21f589e 100644
--- a/hash.h
+++ b/hash.h
@@ -457,7 +457,10 @@ static inline unsigned int oidhash(const struct object_id *oid)
 	 * platforms that don't support unaligned reads.
 	 */
 	unsigned int hash;
+	unsigned int entropy;
 	memcpy(&hash, oid->hash, sizeof(hash));
+	memcpy(&entropy, oid->hash + sizeof(hash), sizeof(entropy));
+	hash ^= entropy;
 	return hash;
 }
 

I suspect it won't make a big time difference. The old code should have
been optimized down to a single word load, and now we have two word
loads and an xor. That probably isn't important compared to the actual
5-word memcmp() we have to do in order to verify that we found the right
bucket anyway.

-Peff

  parent reply	other threads:[~2025-09-02 12:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-24  8:37 [PATCH] describe: use khash in finish_depth_computation() René Scharfe
2025-08-24 10:31 ` Jeff King
2025-08-24 16:32   ` René Scharfe
2025-08-25  7:34     ` Jeff King
2025-08-25  8:13       ` Jeff King
2025-08-25 18:48         ` Junio C Hamano
2025-08-26  3:39           ` Jeff King
2025-08-26  4:26             ` Jeff King
2025-08-26  5:52               ` Jeff King
2025-08-26 15:34               ` Junio C Hamano
2025-08-31 17:25       ` René Scharfe
2025-09-01 19:06         ` René Scharfe
2025-09-02 12:38         ` Jeff King [this message]
2025-09-02 18:51           ` René Scharfe
2025-09-03 14:31             ` Jeff King
2025-09-03 15:41               ` René Scharfe
2025-09-04 11:16                 ` Jeff King
2025-09-03 16:30       ` René Scharfe
2025-09-04 11:22         ` Jeff King
2025-09-02 18:24 ` [PATCH v2] describe: use oidset " René Scharfe
2025-09-03 14:36   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250902123834.GA711442@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=l.s.r@web$(echo .)de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox