From: Derrick Stolee <stolee@gmail•com>
To: Junio C Hamano <gitster@pobox•com>, git@vger•kernel.org
Cc: Jon Forrest <nobozo@gmail•com>
Subject: Re: [PATCH v2] abbrev: allow extending beyond 20 chars to disambiguate
Date: Tue, 12 Aug 2025 09:28:43 -0400 [thread overview]
Message-ID: <55cbc80e-2585-49dd-80e9-33f200a9fac1@gmail.com> (raw)
In-Reply-To: <xmqqzfc51xvk.fsf@gitster.g>
On 8/11/2025 3:06 PM, Junio C Hamano wrote:
> When you have two or more objects with object names that share more
> than half the length of the hash algorithm in use (e.g. 10 bytes for
> SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
> fails to show disambiguation.
>
> To see how many leading letters of a given full object name is
> sufficiently unambiguous, the algorithm starts from a initial
> length, guessed based on the estimated number of objects in the
> repository, and see if another object that shares the prefix, and
> keeps extending the abbreviation. The loop stops at GIT_MAX_RAWSZ,
> which is counted as the number of bytes, since 5b20ace6 (sha1_name:
> unroll len loop in find_unique_abbrev_r(), 2017-10-08);
Wow. My first Git contribution.
Some style issues that you're opportunistically cleaning up are
due to my newness, for sure.
> before that
> change, it extended up to GIT_SHA1_HEXSZ, which was the correct
> limit because the loop is adding one output letter per iteration
> and back then SHA256 was not in the picture.
>
> Pass the max length of the hash being in use in the current
> repository down the code path, and use it to compute the code to
> update the abbreviation length required to make it unique.
>
> Signed-off-by: Junio C Hamano <gitster@pobox•com>
> ---
> object-name.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/object-name.c b/object-name.c
> index 11aa0e6afc..8f9af57c0a 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -680,6 +680,7 @@ static unsigned msb(unsigned long val)
> struct min_abbrev_data {
> unsigned int init_len;
> unsigned int cur_len;
> + unsigned int max_len;
> char *hex;
> struct repository *repo;
> const struct object_id *oid;
> @@ -699,12 +700,12 @@ static inline char get_hex_char_from_oid(const struct object_id *oid,
> static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
> {
> struct min_abbrev_data *mad = cb_data;
> -
> unsigned int i = mad->init_len;
> +
> while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
> i++;
>
> - if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
> + if (mad->cur_len <= i && i < mad->max_len)
> mad->cur_len = i + 1;
This logic is all about not extending the abbreviation
length to beyond the length of the hex array, so your
limits make sense. Moving the comparisons here helps
with readability.
> return 0;
> @@ -864,6 +865,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
> mad.repo = r;
> mad.init_len = len;
> mad.cur_len = len;
> + mad.max_len = hexsz;
This new parameter required for allowing SHA256 is valuable.
I agree that we shouldn't need a test case to guarantee this
in the future. Good to clean up unnecessary uses of the
macro limits.
Thanks,
-Stolee
next prev parent reply other threads:[~2025-08-12 13:29 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 15:26 [PATCH] abbrev: allow extending beyond 20 chars to disambiguate Junio C Hamano
2025-08-11 18:53 ` Junio C Hamano
2025-08-11 19:06 ` [PATCH v2] " Junio C Hamano
2025-08-11 22:23 ` brian m. carlson
2025-08-12 13:28 ` Derrick Stolee [this message]
2025-08-12 14:58 ` René Scharfe
2025-08-12 15:17 ` Junio C Hamano
2025-08-12 15:59 ` René Scharfe
2025-08-14 15:09 ` [PATCH v3] abbrev: allow extending beyond 32 " Junio C Hamano
2025-08-11 21:17 ` [PATCH] abbrev: allow extending beyond 20 " brian m. carlson
2025-08-11 21:25 ` Junio C Hamano
2025-08-11 21:28 ` Junio C Hamano
2025-08-12 15:26 ` Jon Forrest
2025-08-12 16:21 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55cbc80e-2585-49dd-80e9-33f200a9fac1@gmail.com \
--to=stolee@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=nobozo@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox