public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Nicolas Pitre <nico@fluxnic•net>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 10/23] pack v4: tree object encoding
Date: Tue, 27 Aug 2013 08:44:09 -0700	[thread overview]
Message-ID: <xmqqtxibdtpy.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1377577567-27655-11-git-send-email-nico@fluxnic.net> (Nicolas Pitre's message of "Tue, 27 Aug 2013 00:25:54 -0400")

Nicolas Pitre <nico@fluxnic•net> writes:

> This goes as follows:
>
> - Number of tree entries: variable length encoded.
>
> Then for each tree entry:
>
> - Path component reference: variable length encoded index into the path
>   string dictionary table which also covers the entry mode.  To make the
>   overall encoding efficient, the author table is already sorted by usage
>   frequency so the most used names are first and require the shortest
>   index encoding.

s/author table/tree path table/, I think. The reason why it is a
good idea to sort these tables by use count applies equally to both
the author table and the tree path table, though.

> - SHA1 reference: either variable length encoding of the index into the
>   SHA1 table or the literal SHA1 prefixed by 0 (see add_sha1_ref()).
>
> Rationale: all the tree object data is densely encoded while requiring
> no zlib inflate processing, and all SHA1 references are most likely to
> be direct indices into the pack index file requiring no SHA1 search.
> Path filtering can be accomplished on the path index directly without
> any string comparison during the tree traversal.
>
> Still lacking is some kind of delta encoding for multiple tree objects
> with only small differences between them.  But that'll come later.
>
> Signed-off-by: Nicolas Pitre <nico@fluxnic•net>
> ---
>  packv4-create.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>
> diff --git a/packv4-create.c b/packv4-create.c
> index cedbbd9..f46fbd9 100644
> --- a/packv4-create.c
> +++ b/packv4-create.c
> @@ -408,6 +408,69 @@ bad:
>  	return NULL;
>  }
>  
> +/*
> + * This converts a canonical tree object buffer into its
> + * tightly packed representation using the already populated
> + * and sorted tree_path_table dictionary.  The parsing is
> + * strict so to ensure the canonical version may always be
> + * regenerated and produce the same hash.
> + */
> +void * conv_to_dict_tree(void *buffer, unsigned long *psize)
> +{
> +	unsigned long size = *psize;
> +	unsigned char *in, *out, *end;
> +	struct tree_desc desc;
> +	struct name_entry name_entry;
> +	int nb_entries;
> +
> +	if (!size)
> +		return NULL;
> +
> +	/*
> +	 * We can't make sure the result will always be smaller.
> +	 * The smallest possible entry is "0 x\0<40 byte SHA1>"
> +	 * or 44 bytes.  The output entry may have a realistic path index
> +	 * encoding using up to 3 bytes, and a non indexable SHA1 meaning
> +	 * 41 bytes.  And the output data already has the and nb_entries
> +	 * headers.  In practice the output size will be significantly
> +	 * smaller but for now let's make it simple.
> +	 */
> +	in = buffer;
> +	out = xmalloc(size);
> +	end = out + size;
> +	buffer = out;
> +
> +	/* let's count how many entries there are */
> +	init_tree_desc(&desc, in, size);
> +	nb_entries = 0;
> +	while (tree_entry(&desc, &name_entry))
> +		nb_entries++;
> +	out = add_number(out, nb_entries);
> +
> +	init_tree_desc(&desc, in, size);
> +	while (tree_entry(&desc, &name_entry)) {
> +		int pathlen = tree_entry_len(&name_entry);
> +		int index = dict_add_entry(tree_path_table, name_entry.mode,
> +					   name_entry.path, pathlen);
> +		if (index < 0) {
> +			error("missing tree dict entry");
> +			free(buffer);
> +			return NULL;
> +		}
> +		if (end - out < 45) {
> +			unsigned long sofar = out - (unsigned char *)buffer;
> +			buffer = xrealloc(buffer, sofar + 45);
> +			out = (unsigned char *)buffer + sofar;
> +			end = out + 45;
> +		}
> +		out = add_number(out, index);
> +		out = add_sha1_ref(out, name_entry.sha1);
> +	}
> +
> +	*psize = out - (unsigned char *)buffer;
> +	return buffer;
> +}
> +
>  static struct pack_idx_entry *get_packed_object_list(struct packed_git *p)
>  {
>  	unsigned i, nr_objects = p->num_objects;

  reply	other threads:[~2013-08-27 15:44 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-27  4:25 [PATCH 00/23] Preliminary pack v4 support Nicolas Pitre
2013-08-27  4:25 ` [PATCH 01/23] pack v4: initial pack dictionary structure and code Nicolas Pitre
2013-08-27 15:08   ` Junio C Hamano
2013-08-27 16:13     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 02/23] export packed_object_info() Nicolas Pitre
2013-08-27  4:25 ` [PATCH 03/23] pack v4: scan tree objects Nicolas Pitre
2013-08-27  4:25 ` [PATCH 04/23] pack v4: add tree entry mode support to dictionary entries Nicolas Pitre
2013-08-27  4:25 ` [PATCH 05/23] pack v4: add commit object parsing Nicolas Pitre
2013-08-27 15:26   ` Junio C Hamano
2013-08-27 16:47     ` Nicolas Pitre
2013-08-27 17:42       ` Junio C Hamano
2013-08-27  4:25 ` [PATCH 06/23] pack v4: split the object list and dictionary creation Nicolas Pitre
2013-08-27  4:25 ` [PATCH 07/23] pack v4: move to struct pack_idx_entry and get rid of our own struct idx_entry Nicolas Pitre
2013-08-27  4:25 ` [PATCH 08/23] pack v4: basic references encoding Nicolas Pitre
2013-08-27 15:29   ` Junio C Hamano
2013-08-27 15:53     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 09/23] pack v4: commit object encoding Nicolas Pitre
2013-08-27 15:39   ` Junio C Hamano
2013-08-27 16:50     ` Nicolas Pitre
2013-08-27 19:59     ` Nicolas Pitre
2013-08-27 20:15       ` Junio C Hamano
2013-08-27 21:43         ` Nicolas Pitre
2013-09-02 20:48   ` Duy Nguyen
2013-09-03  6:30     ` Nicolas Pitre
2013-09-03  7:41       ` Duy Nguyen
2013-09-05  3:50         ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 10/23] pack v4: tree " Nicolas Pitre
2013-08-27 15:44   ` Junio C Hamano [this message]
2013-08-27 16:52     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 11/23] pack v4: dictionary table output Nicolas Pitre
2013-08-27  4:25 ` [PATCH 12/23] pack v4: creation code Nicolas Pitre
2013-08-27 15:48   ` Junio C Hamano
2013-08-27 16:59     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 13/23] pack v4: object headers Nicolas Pitre
2013-08-27  4:25 ` [PATCH 14/23] pack v4: object data copy Nicolas Pitre
2013-08-27 15:53   ` Junio C Hamano
2013-08-27 18:24     ` Nicolas Pitre
2013-08-27  4:25 ` [PATCH 15/23] pack v4: object writing Nicolas Pitre
2013-08-27  4:26 ` [PATCH 16/23] pack v4: tree object delta encoding Nicolas Pitre
2013-08-27  4:26 ` [PATCH 17/23] pack v4: load delta candidate for encoding tree objects Nicolas Pitre
2013-08-27  4:26 ` [PATCH 18/23] pack v4: honor pack.compression config option Nicolas Pitre
2013-08-27  4:26 ` [PATCH 19/23] pack v4: relax commit parsing a bit Nicolas Pitre
2013-08-27  4:26 ` [PATCH 20/23] pack index v3 Nicolas Pitre
2013-08-27  4:26 ` [PATCH 21/23] pack v4: normalize pack name to properly generate the pack index file name Nicolas Pitre
2013-08-27  4:26 ` [PATCH 22/23] pack v4: add progress display Nicolas Pitre
2013-08-27  4:26 ` [PATCH 23/23] initial pack index v3 support on the read side Nicolas Pitre
2013-08-31 11:45   ` Duy Nguyen
2013-09-03  6:09     ` Nicolas Pitre
2013-09-03  7:34       ` Duy Nguyen
2013-08-27 11:17 ` [PATCH] Document pack v4 format Nguyễn Thái Ngọc Duy
2013-08-27 18:25   ` Junio C Hamano
2013-08-27 18:53   ` Nicolas Pitre
2013-08-31  2:49   ` [PATCH v2] " Nguyễn Thái Ngọc Duy
2013-09-03  6:00     ` Nicolas Pitre
2013-09-03  6:46       ` Nicolas Pitre
2013-09-03 11:49         ` Duy Nguyen
2013-09-03 14:54           ` Duy Nguyen
2013-09-05  4:12             ` Nicolas Pitre
2013-09-05  4:19               ` Duy Nguyen
2013-09-05  4:40                 ` Nicolas Pitre
2013-09-05  5:04                   ` Duy Nguyen
2013-09-05  5:39                     ` Nicolas Pitre
2013-09-05 16:52                       ` Duy Nguyen
2013-09-05 17:14                         ` Nicolas Pitre
2013-09-05 20:26                           ` Junio C Hamano
2013-09-05 21:04                             ` Nicolas Pitre
2013-09-06  4:18                         ` Duy Nguyen
2013-09-06 13:19                           ` Nicolas Pitre
2013-09-06  2:14     ` [PATCH v3] " Nguyễn Thái Ngọc Duy
2013-09-06  3:23       ` Nicolas Pitre
2013-09-06  9:48         ` Duy Nguyen
2013-09-06 13:25           ` Nicolas Pitre
2013-09-06 13:44             ` Duy Nguyen
2013-09-06 16:44               ` Nicolas Pitre
2013-09-07  4:57                 ` Nicolas Pitre
2013-09-07  4:52       ` Nicolas Pitre
2013-09-07  8:05         ` Duy Nguyen
2013-08-27 15:03 ` [PATCH 00/23] Preliminary pack v4 support Junio C Hamano
2013-08-27 15:59   ` Nicolas Pitre
2013-08-27 16:44     ` Junio C Hamano
2013-08-28  2:30       ` Duy Nguyen
2013-08-28  2:58         ` Nicolas Pitre
2013-08-28  3:06           ` Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqtxibdtpy.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=nico@fluxnic$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox