From: Jeff King <peff@peff•net>
To: "René Scharfe" <l.s.r@web•de>
Cc: Git List <git@vger•kernel.org>
Subject: Re: [PATCH] describe: use khash in finish_depth_computation()
Date: Tue, 2 Sep 2025 08:38:34 -0400 [thread overview]
Message-ID: <20250902123834.GA711442@coredump.intra.peff.net> (raw)
In-Reply-To: <cb192b28-d85a-4866-a312-df4408cae93e@web.de>
On Sun, Aug 31, 2025 at 07:25:13PM +0200, René Scharfe wrote:
> > diff --git a/builtin/describe.c b/builtin/describe.c
> > index edb4dec79d..e024feb080 100644
> > --- a/builtin/describe.c
> > +++ b/builtin/describe.c
> > @@ -289,7 +289,7 @@ static void lazy_queue_clear(struct lazy_queue *queue)
> >
> > static inline unsigned int commit_index(const struct commit *commit)
> > {
> > - return commit->index;
> > + return oidhash(&commit->object.oid);
> > }
> >
> > static inline int ptr_eq(const void *a, const void *b)
> >
> > I get similar results (actually faster, but I think within the noise).
>
> Sure. I'm not comfortable with oidhash() though, because it allows
> attackers to influence the hash value, cause collisions and thus
> increase the cost of lookups and inserts to O(N), leading to quadratic
> complexity overall.
>
> They "just" need to construct commits with a common hash prefix. I
> guess that's easy for two bytes and hard for four bytes. Not sure how
> what an attacker would get out of planting such performance traps, but
> I guess some people would do it just for the heck of it.
I doubt that commit->index is any better in that regard. If I can
influence the order in which Git loads the commits (e.g., by creating a
bunch of refs which get loaded when we walk over for_each_ref), I can
choose the index for each. They'll be unique, but I can still cause
collisions modulo the number of buckets.
Likewise for oidhash(), I'd guess that colliding 4 bytes is not even
necessary to cause trouble, since probing starts by throwing away
everything mod n_buckets. So really you just need to collide however
many low bits you need to make your desired N, and then get O(N^2)
behavior.
I'm not super worried about it, because in my experience Git is already
a perfectly tuned device for convincing other people to waste CPU. ;)
But if we want to address it, I'd rather do so by improving oidhash()
than avoiding it. Specifically...
> Letting oidhash() XOR in another word would close that line of attack
> for quite a while, I assume.
Yeah. We have at least 160 bits in an object hash, and we only bother
with the low 32. We could XOR up to 5 times, but I agree that even a
single extra word would probably be plenty. Might be an interesting
experiment to time something like the patch below on a hash-heavy
workload.
diff --git a/hash.h b/hash.h
index fae966b23c..c9d21f589e 100644
--- a/hash.h
+++ b/hash.h
@@ -457,7 +457,10 @@ static inline unsigned int oidhash(const struct object_id *oid)
* platforms that don't support unaligned reads.
*/
unsigned int hash;
+ unsigned int entropy;
memcpy(&hash, oid->hash, sizeof(hash));
+ memcpy(&entropy, oid->hash + sizeof(hash), sizeof(entropy));
+ hash ^= entropy;
return hash;
}
I suspect it won't make a big time difference. The old code should have
been optimized down to a single word load, and now we have two word
loads and an xor. That probably isn't important compared to the actual
5-word memcmp() we have to do in order to verify that we found the right
bucket anyway.
-Peff
next prev parent reply other threads:[~2025-09-02 12:38 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-24 8:37 [PATCH] describe: use khash in finish_depth_computation() René Scharfe
2025-08-24 10:31 ` Jeff King
2025-08-24 16:32 ` René Scharfe
2025-08-25 7:34 ` Jeff King
2025-08-25 8:13 ` Jeff King
2025-08-25 18:48 ` Junio C Hamano
2025-08-26 3:39 ` Jeff King
2025-08-26 4:26 ` Jeff King
2025-08-26 5:52 ` Jeff King
2025-08-26 15:34 ` Junio C Hamano
2025-08-31 17:25 ` René Scharfe
2025-09-01 19:06 ` René Scharfe
2025-09-02 12:38 ` Jeff King [this message]
2025-09-02 18:51 ` René Scharfe
2025-09-03 14:31 ` Jeff King
2025-09-03 15:41 ` René Scharfe
2025-09-04 11:16 ` Jeff King
2025-09-03 16:30 ` René Scharfe
2025-09-04 11:22 ` Jeff King
2025-09-02 18:24 ` [PATCH v2] describe: use oidset " René Scharfe
2025-09-03 14:36 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250902123834.GA711442@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=l.s.r@web$(echo .)de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox