public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: "Victoria Dye via GitGitGadget" <gitgitgadget@gmail•com>
Cc: git@vger•kernel.org,  Victoria Dye <vdye@github•com>
Subject: Re: [PATCH 04/16] update-index: generalize 'read_index_info'
Date: Tue, 11 Jun 2024 15:45:59 -0700	[thread overview]
Message-ID: <xmqqa5jrt7x4.fsf@gitster.g> (raw)
In-Reply-To: <9d0689e9c285b375b0067760929011038c085d65.1718130288.git.gitgitgadget@gmail.com> (Victoria Dye via GitGitGadget's message of "Tue, 11 Jun 2024 18:24:36 +0000")

"Victoria Dye via GitGitGadget" <gitgitgadget@gmail•com> writes:

> From: Victoria Dye <vdye@github•com>
>
> Move 'read_index_info()' into a new header 'index-info.h' and generalize the
> function to call a provided callback for each parsed line. Update
> 'update-index.c' to use this generalized 'read_index_info()', adding the
> callback 'apply_index_info()' to verify the parsed line and update the index
> according to its contents.
>
> The input parsing done by 'read_index_info()' is similar to, but more
> flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not
> only 'git ls-tree' output but also the outputs of 'git apply --index-info'
> and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later
> patch will replace mktree's custom parsing with 'read_index_info()'.

"git apply --index-info"?  

That is a blast from the past.  It no longer exists since 7a988699
(apply: get rid of --index-info in favor of --build-fake-ancestor,
2007-09-17).

As to the scriptability, supporting "ls-files -s" and "ls-tree -r"
output as our input do help, but the third one is not natively
emitted and it is very unlikely that there are third-party tools
that give output in that format.  After all these years, I suspect
that it is sufficient to say

    "update-index --index-info" and "mktree" both read information
    necessary to eventually build trees, but having two separate
    parsers is a maintenance burden, so we are massaging the code
    from the former to be reusable.

without mentioning where the old third format comes from.

> diff --git a/builtin/update-index.c b/builtin/update-index.c
> index d343416ae26..77df380cb54 100644
> --- a/builtin/update-index.c
> +++ b/builtin/update-index.c
> @@ -11,6 +11,7 @@
>  #include "gettext.h"
>  #include "hash.h"
>  #include "hex.h"
> +#include "index-info.h"
>  #include "lockfile.h"
>  #include "quote.h"
>  #include "cache-tree.h"
> @@ -509,100 +510,29 @@ static void update_one(const char *path)
>  	report("add '%s'", path);
>  }
>  
> +static int apply_index_info(unsigned int mode, struct object_id *oid, int stage,
> +			    const char *path_name, void *cbdata UNUSED)
>  {
> +	if (!verify_path(path_name, mode)) {
> +		fprintf(stderr, "Ignoring path %s\n", path_name);
> +		return 0;
> +	}
>  
> +	if (!mode) {
> +		/* mode == 0 means there is no such path -- remove */
> +		if (remove_file_from_index(the_repository->index, path_name))
> +			die("git update-index: unable to remove %s", path_name);

This changes the error message.  We used to feed "ptr" (no longer
visible to this function, as the caller unquotes before calling us)
that pointed at the original the user gave to the program; now we
report the path_name which is the result of the unquoting.

> +	}
> +	else {
> +		/* mode ' ' sha1 '\t' name
> +		 * ptr[-1] points at tab,
> +		 * ptr[-41] is at the beginning of sha1
>  		 */
> +		if (add_cacheinfo(mode, oid, path_name, stage))
> +			die("git update-index: unable to update %s", path_name);

But this side used to report the path_name as the result of
unquoting in the original.  So the above change would probably be OK
in the name of consistency?

973d6a20 (update-index --index-info: adjust for funny-path quoting.,
2005-10-16) was the origin of the unquoting, and looking at that
commit, I have a feeling that the "ptr" thing above (i.e., the one I
pointed out as changing the behaviour) was simply forgotten (as
opposed to deliberately made to report the original) while updating
the code to deal with quoted original into unquoted paths.

So I think the change is more than OK.  It is a very welcome (belated)
bugfix for 973d6a20 ;-).

>  	}
> +
> +	return 0;
>  }

It looks a bit disappointing that we die in the callback like above,
when the main parser loop that moved to the other file to be more
reusable is now capable of returning to the caller with an error,
but at this step, it is a good place to stop.  A refactor that does
not change the behaviour.

Nicely done.

> diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh
> index cc72ead79f3..29696ade0d0 100755
> --- a/t/t2107-update-index-basic.sh
> +++ b/t/t2107-update-index-basic.sh
> @@ -142,4 +142,31 @@ test_expect_success '--index-version' '
>  	test_must_be_empty actual
>  '
>  
> +test_expect_success '--index-info fails on malformed input' '
> +	# empty line
> +	echo "" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&

Using "test_grep" would make it easier to diagnose when test breaks.
A failing "grep" will be silent.  A failing "test_grep" will tell us
"I was told to find THIS, but didn't find any in THAT".

> +	# bad whitespace
> +	printf "100644 $EMPTY_BLOB A" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&
> +
> +	# invalid stage value
> +	printf "100644 $EMPTY_BLOB 5\tA" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&
> +
> +	# invalid OID length
> +	printf "100755 abc123\tA" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "malformed input line" err &&
> +
> +	# bad quoting
> +	printf "100644 $EMPTY_BLOB\t\"A" |
> +	test_must_fail git update-index --index-info 2>err &&
> +	grep "bad quoting of path name" err
> +'
> +
>  test_done

  reply	other threads:[~2024-06-11 22:46 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-11 18:24 [PATCH 00/16] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 01/16] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 02/16] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 03/16] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-11 18:45   ` Eric Sunshine
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 04/16] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-11 22:45   ` Junio C Hamano [this message]
2024-06-11 18:24 ` [PATCH 05/16] index-info.c: identify empty input lines in read_index_info Victoria Dye via GitGitGadget
2024-06-11 22:52   ` Junio C Hamano
2024-06-18 17:33     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 06/16] index-info.c: parse object type in provided " Victoria Dye via GitGitGadget
2024-06-12  1:54   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 07/16] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-12  2:11   ` Junio C Hamano
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 18:35     ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 08/16] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-12  2:18   ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 09/16] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-12  2:26   ` Junio C Hamano
2024-06-12 19:01     ` Victoria Dye
2024-06-12 19:45       ` Junio C Hamano
2024-06-11 18:24 ` [PATCH 10/16] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 18:48     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 11/16] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-11 18:24 ` [PATCH 12/16] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-13 18:38     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 13/16] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 14/16] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-12  9:40   ` Patrick Steinhardt
2024-06-12 19:50     ` Junio C Hamano
2024-06-17 19:23     ` Victoria Dye
2024-06-11 18:24 ` [PATCH 15/16] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-11 18:24 ` [PATCH 16/16] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-19 21:57 ` [PATCH v2 00/17] mktree: support more flexible usage Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 01/17] mktree: use OPT_BOOL Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 02/17] mktree: rename treeent to tree_entry Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 03/17] mktree: use non-static tree_entry array Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 04/17] update-index: generalize 'read_index_info' Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 05/17] index-info.c: return unrecognized lines to caller Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 06/17] index-info.c: parse object type in provided in read_index_info Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 07/17] mktree: use read_index_info to read stdin lines Victoria Dye via GitGitGadget
2024-06-20 20:18     ` Junio C Hamano
2024-06-19 21:57   ` [PATCH v2 08/17] mktree.c: do not fail on mismatched submodule type Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 09/17] mktree: add a --literally option Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 10/17] mktree: validate paths more carefully Victoria Dye via GitGitGadget
2024-06-19 21:57   ` [PATCH v2 11/17] mktree: overwrite duplicate entries Victoria Dye via GitGitGadget
2024-06-20 22:05     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 12/17] mktree: create tree using an in-core index Victoria Dye via GitGitGadget
2024-06-20 22:26     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 13/17] mktree: use iterator struct to add tree entries to index Victoria Dye via GitGitGadget
2024-06-26 21:10     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 14/17] mktree: add directory-file conflict hashmap Victoria Dye via GitGitGadget
2024-06-19 21:58   ` [PATCH v2 15/17] mktree: optionally add to an existing tree Victoria Dye via GitGitGadget
2024-06-26 21:23     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 16/17] mktree: allow deeper paths in input Victoria Dye via GitGitGadget
2024-06-27 19:29     ` Junio C Hamano
2024-06-19 21:58   ` [PATCH v2 17/17] mktree: remove entries when mode is 0 Victoria Dye via GitGitGadget
2024-06-25 23:26   ` [PATCH v2 00/17] mktree: support more flexible usage Junio C Hamano
2024-07-10 21:40     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqa5jrt7x4.fsf@gitster.g \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=vdye@github$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox