public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Jeff King <peff@peff•net>
Cc: Git Mailing List <git@vger•kernel.org>,
	Duy Nguyen <pclouds@gmail•com>,
	"Shawn O. Pearce" <spearce@spearce•org>,
	Nicolas Pitre <nico@fluxnic•net>
Subject: Re: [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP
Date: Fri, 23 Aug 2013 09:41:57 -0700	[thread overview]
Message-ID: <xmqqob8opdey.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20130822231404.GB17060@sigill.intra.peff.net> (Jeff King's message of "Thu, 22 Aug 2013 19:14:04 -0400")

Jeff King <peff@peff•net> writes:

> Furthermore, we know that one of our endpoints must be
> the edge of the run of duplicates. For example, given this
> sequence:
>
>  idx 0 1 2 3 4 5
>  key A C C C C D
>
> If we are searching for "B", we might hit the duplicate run
> at lo=1, hi=3 (e.g., by first mi=3, then mi=0). But we can
> never have lo > 1, because B < C. That is, if our key is
> less than the run, we know that "lo" is the edge, but we can
> say nothing of "hi". Similarly, if our key is greater than
> the run, we know that "hi" is the edge, but we can say
> nothing of "lo". But that is enough for us to return not
> only "not found", but show the position at which we would
> insert the new item.

This is somewhat tricky and may deserve an in-code comment.

> diff --git a/sha1-lookup.c b/sha1-lookup.c
> index c4dc55d..614cbb6 100644
> --- a/sha1-lookup.c
> +++ b/sha1-lookup.c
> @@ -204,7 +204,30 @@ int sha1_entry_pos(const void *table,
>  			 * byte 0 thru (ofs-1) are the same between
>  			 * lo and hi; ofs is the first byte that is
>  			 * different.
> +			 *
> +			 * If ofs==20, then no bytes are different,
> +			 * meaning we have entries with duplicate
> +			 * keys. We know that we are in a solid run
> +			 * of this entry (because the entries are
> +			 * sorted, and our lo and hi are the same,
> +			 * there can be nothing but this single key
> +			 * in between). So we can stop the search.
> +			 * Either one of these entries is it (and
> +			 * we do not care which), or we do not have
> +			 * it.
>  			 */
> +			if (ofs == 20) {
> +				mi = lo;
> +				mi_key = base + elem_size * mi + key_offset;
> +				cmp = memcmp(mi_key, key, 20);

It think we already know that mi_key[0:ofs_0] and key[0:ofs_0] are
the same at this point and we do not have to compare full 20 bytes
again, but this is done only once and a better readablity of the
above trumps micro-optimization possibility, I think.

> +				if (!cmp)
> +					return mi;
> +				if (cmp < 0)
> +					return -1 - hi;
> +				else
> +					return -1 - lo;
> +			}
> +
>  			hiv = hi_key[ofs_0];
>  			if (ofs_0 < 19)
>  				hiv = (hiv << 8) | hi_key[ofs_0+1];
> diff --git a/t/lib-pack.sh b/t/lib-pack.sh
> new file mode 100644
> index 0000000..61c5d19
> --- /dev/null
> +++ b/t/lib-pack.sh
> @@ -0,0 +1,78 @@
> +#!/bin/sh
> +#
> +# Support routines for hand-crafting weird or malicious packs.
> +#
> +# You can make a complete pack like:
> +#
> +#   pack_header 2 >foo.pack &&
> +#   pack_obj e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 >>foo.pack &&
> +#   pack_obj e68fe8129b546b101aee9510c5328e7f21ca1d18 >>foo.pack &&
> +#   pack_trailer foo.pack
> +
> +# Print the big-endian 4-byte octal representation of $1
> +uint32_octal() {

micronit (style):

	uint32_octal () {

> +	n=$1
> +	printf '\%o' $(($n / 16777216)); n=$((n % 16777216))
> +	printf '\%o' $(($n /    65536)); n=$((n %    65536))
> +	printf '\%o' $(($n /      256)); n=$((n %      256))
> +	printf '\%o' $(($n           ));
> +}
> +
> +# Print the big-endian 4-byte binary representation of $1
> +uint32_binary() {
> +	printf "$(uint32_octal "$1")"
> +}
> +
> +# Print a pack header, version 2, for a pack with $1 objects
> +pack_header() {
> +	printf 'PACK' &&
> +	printf '\0\0\0\2' &&
> +	uint32_binary "$1"
> +}
> +
> +# Print the pack data for object $1, as a delta against object $2 (or as a full
> +# object if $2 is missing or empty). The output is suitable for including
> +# directly in the packfile, and represents the entirety of the object entry.
> +# Doing this on the fly (especially picking your deltas) is quite tricky, so we
> +# have hardcoded some well-known objects. See the case statements below for the
> +# complete list.

Cute ;-) I like the idea of having this function with a right API in
place, and cheating by limiting its implementation to what is
necessary.

Thanks.

  reply	other threads:[~2013-08-23 16:42 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-14 18:17 duplicate objects in packfile Jeff King
2013-08-14 18:29 ` Junio C Hamano
2013-08-14 18:39   ` Junio C Hamano
2013-08-14 18:54     ` Nicolas Pitre
2013-08-16 15:01     ` Jeff King
2013-08-21 20:49       ` [RFC/PATCH 0/4] duplicate objects in packfiles Jeff King
2013-08-21 20:51         ` [PATCH 1/4] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-21 20:52         ` [PATCH 2/4] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 13:45           ` Duy Nguyen
2013-08-22 14:43             ` Jeff King
2013-08-22 23:12               ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Jeff King
2013-08-22 23:12                 ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-22 23:14                 ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-23 16:41                   ` Junio C Hamano [this message]
2013-08-23 18:24                     ` Jeff King
2013-08-23 18:54                       ` Nicolas Pitre
2013-08-23 18:56                         ` Jeff King
2013-08-24  0:01                       ` [PATCHv3 0/6] duplicate objects and delta cycles Jeff King
2013-08-24  0:01                         ` [PATCH 1/6] test-sha1: add a binary output mode Jeff King
2013-08-24  0:02                         ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Jeff King
2013-08-24  0:02                         ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-24  0:02                         ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-24  0:02                         ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-24  0:02                         ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-23 19:41                   ` [PATCH 2/6] sha1-lookup: handle duplicate keys with GIT_USE_LOOKUP Johannes Sixt
2013-08-23 23:37                     ` Jeff King
2013-08-22 23:14                 ` [PATCH 3/6] add tests for indexing packs with delta cycles Jeff King
2013-08-22 23:15                 ` [PATCH 4/6] test index-pack on packs with recoverable " Jeff King
2013-08-22 23:15                 ` [PATCH 5/6] index-pack: optionally reject packs with duplicate objects Jeff King
2013-08-22 23:16                 ` [PATCH 6/6] default pack.indexDuplicates to false Jeff King
2013-08-22 23:35                 ` [PATCHv2 0/6] duplicate objects and delta cycles, oh my! Nicolas Pitre
2013-08-21 20:53         ` [PATCH 3/4] reject duplicates when indexing a packfile we created Jeff King
2013-08-21 20:55         ` [DO NOT APPLY PATCH 4/4] index-pack: optionally skip duplicate packfile entries Jeff King
2013-08-21 23:20           ` Junio C Hamano
2013-08-22  0:47             ` Jeff King
2013-08-21 22:17         ` [RFC/PATCH 0/4] duplicate objects in packfiles Junio C Hamano
2013-08-14 18:50 ` duplicate objects in packfile Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqob8opdey.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=nico@fluxnic$(echo .)net \
    --cc=pclouds@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    --cc=spearce@spearce$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox