From: Junio C Hamano <gitster@pobox•com>
To: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail•com>
Cc: git@vger•kernel.org, Eric Sunshine <sunshine@sunshineco•com>,
Patrick Steinhardt <ps@pks•im>, Victoria Dye <vdye@github•com>
Subject: Re: [PATCH v2 12/17] mktree: create tree using an in-core index
Date: Thu, 20 Jun 2024 15:26:09 -0700 [thread overview]
Message-ID: <xmqqtthnqmim.fsf@gitster.g> (raw)
In-Reply-To: <2333775ba5bd71766a6aece87e39a6d189aeaead.1718834285.git.gitgitgadget@gmail.com> (Victoria Dye via GitGitGadget's message of "Wed, 19 Jun 2024 21:58:00 +0000")
"Victoria Dye via GitGitGadget" <gitgitgadget@gmail•com> writes:
> @@ -60,17 +66,25 @@ static void append_to_tree(unsigned mode, struct object_id *oid, const char *pat
> if (literally) {
> FLEX_ALLOC_MEM(ent, name, path, len);
> } else {
> + size_t len_to_copy = len;
> +
> /* Normalize and validate entry path */
> if (S_ISDIR(mode)) {
> - while(len > 0 && is_dir_sep(path[len - 1]))
> - len--;
> + while(len_to_copy > 0 && is_dir_sep(path[len_to_copy - 1]))
Let's fix the style issue while at it, as we are doing other changes
in this step anyway. "while(" -> "while (".
> + len_to_copy--;
> + len = len_to_copy + 1; /* add space for trailing slash */
Do we need to do st_add() here? Perhaps not, but I just noticed the
careful use of st_add3() below, so...
> + ent = xcalloc(1, st_add3(sizeof(struct tree_entry), len, 1));
> + memcpy(ent->name, path, len_to_copy);
>
> if (!verify_path(ent->name, mode))
> die(_("invalid path '%s'"), path);
> if (strchr(ent->name, '/'))
> die("path %s contains slash", path);
> +
> + /* Add trailing slash to dir */
> + if (S_ISDIR(mode))
> + ent->name[len - 1] = '/';
OK.
> @@ -88,11 +102,14 @@ static int ent_compare(const void *a_, const void *b_, void *ctx)
> struct tree_entry *b = *(struct tree_entry **)b_;
> int ignore_mode = *((int *)ctx);
>
> - if (ignore_mode)
> - cmp = name_compare(a->name, a->len, b->name, b->len);
> - else
> - cmp = base_name_compare(a->name, a->len, a->mode,
> - b->name, b->len, b->mode);
> + size_t a_len = a->len, b_len = b->len;
> +
> + if (ignore_mode) {
> + a_len = df_path_len(a_len, a->mode);
> + b_len = df_path_len(b_len, b->mode);
> + }
> +
> + cmp = name_compare(a->name, a_len, b->name, b_len);
> return cmp ? cmp : b->order - a->order;
> }
OK, now the "mode" is sort of "encoded" already in the "name" by the
slash at the end, the way "ignore-mode" works needs to be redesigned.
If we are ignoring mode, we are dropping the trailing '/' and
otherwise we just feed the name with possible trailing '/', and the
same name_compare() can be used. OK.
> @@ -108,8 +125,8 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> for (size_t i = 0; i < count; i++) {
> struct tree_entry *curr = arr->entries[i];
> if (prev &&
> - !name_compare(prev->name, prev->len,
> - curr->name, curr->len)) {
> + !name_compare(prev->name, df_path_len(prev->len, prev->mode),
> + curr->name, df_path_len(curr->len, curr->mode))) {
> FREE_AND_NULL(curr);
And here is the matching adjustment for the dedup comparison, which
makes sense.
> @@ -122,24 +139,43 @@ static void sort_and_dedup_tree_entry_array(struct tree_entry_array *arr)
> QSORT_S(arr->entries, arr->nr, ent_compare, &ignore_mode);
> }
>
> +static int add_tree_entry_to_index(struct index_state *istate,
> + struct tree_entry *ent)
> +{
> + struct cache_entry *ce;
> + struct strbuf ce_name = STRBUF_INIT;
> + strbuf_add(&ce_name, ent->name, ent->len);
> +
Perhaps swap the first statement (which is strbuf_add()) and the
blank line that ought to separate the decls and the first statement?
> + ce = make_cache_entry(istate, ent->mode, &ent->oid, ent->name, 0, 0);
> + if (!ce)
> + return error(_("make_cache_entry failed for path '%s'"), ent->name);
> +
> + add_index_entry(istate, ce, ADD_CACHE_JUST_APPEND);
> + strbuf_release(&ce_name);
> + return 0;
> +}
This is only to append; presumably the caller drives this function
out of a sorted list.
> static void write_tree(struct tree_entry_array *arr, struct object_id *oid)
> {
> + struct index_state istate = INDEX_STATE_INIT(the_repository);
> + istate.sparse_index = 1;
>
> sort_and_dedup_tree_entry_array(arr);
>
> + /* Construct an in-memory index from the provided entries */
> for (size_t i = 0; i < arr->nr; i++) {
> struct tree_entry *ent = arr->entries[i];
> +
> + if (add_tree_entry_to_index(&istate, ent))
> + die(_("failed to add tree entry '%s'"), ent->name);
> }
> + /* Write out new tree */
> + if (cache_tree_update(&istate, WRITE_TREE_SILENT | WRITE_TREE_MISSING_OK))
> + die(_("failed to write tree"));
Hmph. Are we doing any run-time verification of what we produce
(e.g., if sort_and_dedup_tree_entry_array() fails to dedup or sort
correctly due to a bug or two, would cache_tree_update() notice that
the in-core index array is fishy)? I am not suggesting to add an
unconditional "we appended to the index, so we should sort the
entries in it" step before cache_tree_update() call. It is the
opposite---if we have extra checks in cache_tree_udpate() to slow us
down and if we are confident that the loop that added tree entries
to the index is correct, if we can bypass such checks.
> + oidcpy(oid, &istate.cache_tree->oid);
> +
> + release_index(&istate);
> }
This is the gem of the whole series. Clever.
What is so satisfying is that it takes not that much of code to
replace the "here is a flat buffer of what the contents of a single
tree object ought to look like" with "let's build in-core index and
write it out just like write-tree would". Nice.
next prev parent reply other threads:[~2024-06-20 22:26 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45 ` Eric Sunshine
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52 ` Junio C Hamano
2024-06-18 17:33 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12 1:54 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12 2:11 ` Junio C Hamano
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:35 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-12 2:18 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12 2:26 ` Junio C Hamano
2024-06-12 19:01 ` Victoria Dye
2024-06-12 19:45 ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 18:48 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-13 18:38 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12 9:40 ` Patrick Steinhardt
2024-06-12 19:50 ` Junio C Hamano
2024-06-17 19:23 ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18 ` Junio C Hamano
2024-06-19 21:57 ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26 ` Junio C Hamano [this message]
2024-06-19 21:58 ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58 ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29 ` Junio C Hamano
2024-06-19 21:58 ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26 ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqtthnqmim.fsf@gitster.g \
--to=gitster@pobox$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitgitgadget@gmail$(echo .)com \
--cc=ps@pks$(echo .)im \
--cc=sunshine@sunshineco$(echo .)com \
--cc=vdye@github$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox