public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail•com>
To: Junio C Hamano <gitster@pobox•com>, git@vger•kernel.org
Cc: Jon Forrest <nobozo@gmail•com>
Subject: Re: [PATCH v2] abbrev: allow extending beyond 20 chars to disambiguate
Date: Tue, 12 Aug 2025 09:28:43 -0400	[thread overview]
Message-ID: <55cbc80e-2585-49dd-80e9-33f200a9fac1@gmail.com> (raw)
In-Reply-To: <xmqqzfc51xvk.fsf@gitster.g>

On 8/11/2025 3:06 PM, Junio C Hamano wrote:
> When you have two or more objects with object names that share more
> than half the length of the hash algorithm in use (e.g. 10 bytes for
> SHA-1 that produces 20-byte/160-bit hash), find_unique_abbrev()
> fails to show disambiguation.
> 
> To see how many leading letters of a given full object name is
> sufficiently unambiguous, the algorithm starts from a initial
> length, guessed based on the estimated number of objects in the
> repository, and see if another object that shares the prefix, and
> keeps extending the abbreviation.  The loop stops at GIT_MAX_RAWSZ,
> which is counted as the number of bytes, since 5b20ace6 (sha1_name:
> unroll len loop in find_unique_abbrev_r(), 2017-10-08);

Wow. My first Git contribution.

Some style issues that you're opportunistically cleaning up are
due to my newness, for sure.

> before that
> change, it extended up to GIT_SHA1_HEXSZ, which was the correct
> limit because the loop is adding one output letter per iteration
> and back then SHA256 was not in the picture.
> 
> Pass the max length of the hash being in use in the current
> repository down the code path, and use it to compute the code to
> update the abbreviation length required to make it unique.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox•com>
> ---
>  object-name.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/object-name.c b/object-name.c
> index 11aa0e6afc..8f9af57c0a 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -680,6 +680,7 @@ static unsigned msb(unsigned long val)
>  struct min_abbrev_data {
>  	unsigned int init_len;
>  	unsigned int cur_len;
> +	unsigned int max_len;
>  	char *hex;
>  	struct repository *repo;
>  	const struct object_id *oid;
> @@ -699,12 +700,12 @@ static inline char get_hex_char_from_oid(const struct object_id *oid,
>  static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
>  {
>  	struct min_abbrev_data *mad = cb_data;
> -
>  	unsigned int i = mad->init_len;
> +
>  	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
>  		i++;
>  
> -	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
> +	if (mad->cur_len <= i && i < mad->max_len)
>  		mad->cur_len = i + 1;

This logic is all about not extending the abbreviation
length to beyond the length of the hex array, so your
limits make sense. Moving the comparisons here helps
with readability.  
>  	return 0;
> @@ -864,6 +865,7 @@ int repo_find_unique_abbrev_r(struct repository *r, char *hex,
>  	mad.repo = r;
>  	mad.init_len = len;
>  	mad.cur_len = len;
> +	mad.max_len = hexsz;

This new parameter required for allowing SHA256 is valuable.

I agree that we shouldn't need a test case to guarantee this
in the future. Good to clean up unnecessary uses of the
macro limits.

Thanks,
-Stolee


  parent reply	other threads:[~2025-08-12 13:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11 15:26 [PATCH] abbrev: allow extending beyond 20 chars to disambiguate Junio C Hamano
2025-08-11 18:53 ` Junio C Hamano
2025-08-11 19:06 ` [PATCH v2] " Junio C Hamano
2025-08-11 22:23   ` brian m. carlson
2025-08-12 13:28   ` Derrick Stolee [this message]
2025-08-12 14:58   ` René Scharfe
2025-08-12 15:17     ` Junio C Hamano
2025-08-12 15:59       ` René Scharfe
2025-08-14 15:09   ` [PATCH v3] abbrev: allow extending beyond 32 " Junio C Hamano
2025-08-11 21:17 ` [PATCH] abbrev: allow extending beyond 20 " brian m. carlson
2025-08-11 21:25   ` Junio C Hamano
2025-08-11 21:28     ` Junio C Hamano
2025-08-12 15:26   ` Jon Forrest
2025-08-12 16:21     ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55cbc80e-2585-49dd-80e9-33f200a9fac1@gmail.com \
    --to=stolee@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=nobozo@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox