public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Jonatan Holmgren <jonatan@jontes•page>
Cc: git@vger•kernel.org, gitster@pobox•com,
	"D . Ben Knoble" <benknoble@gmail•com>,
	"brian m . carlson" <sandals@crustytoothpaste•net>
Subject: Re: [PATCH v1] alias: support UTF-8 characters via subsection syntax
Date: Tue, 10 Feb 2026 02:44:41 -0500	[thread overview]
Message-ID: <20260210074441.GE1756549@coredump.intra.peff.net> (raw)
In-Reply-To: <20260209220115.461109-1-jonatan@jontes.page>

On Mon, Feb 09, 2026 at 11:01:15PM +0100, Jonatan Holmgren wrote:

> Git aliases are currently restricted to ASCII characters due to
> config key syntax limitations. This prevents non-English speakers
> from creating aliases in their native languages.
> 
> Add support for UTF-8 alias names using config subsections:
> 
>     [alias "förgrena"]
>         command = branch
> 
> The subsection name is matched verbatim (case-sensitive), while the
> existing flat syntax (alias.name) remains case-insensitive for
> backward compatibility. This approach uses existing config
> infrastructure and avoids complex Unicode normalization.

Thanks, this mostly looks quite good. I have a few comments below that
range from a few small bugs to project-specific gotchas to
style/readability suggestions.

> diff --git a/Documentation/RelNotes/2.54.0.adoc b/Documentation/RelNotes/2.54.0.adoc
> [...]

Usually patches do not fill out their own release notes, as it would
create many conflicts when merging topics in arbitrary order. Instead,
the maintainer generally writes up a blurb that goes in the merge
commits and is eventually placed in the release notes by script.
Suggesting a blurb does save work for the maintainer, but it should go
in the cover letter (so for a single patch, after the "---" lines).

There's a bit on this in Documentation/SubmittingPatches, under the
section "the-topic-summary".

> diff --git a/Documentation/config/alias.adoc b/Documentation/config/alias.adoc
> index 80ce17d2de..feba1e2022 100644
> --- a/Documentation/config/alias.adoc
> +++ b/Documentation/config/alias.adoca

Yay, I was happy to see nice comprehensive documentation here. And you
do not seem to have gotten caught by any of the (many) formatting
gotchas.

> +After defining these, you can run `git co`, `git hämta`, `git gömma`,
> +and `git "my alias with whitespace"` (note the quotes for aliases with spaces).

The use of literal backticks here makes sense, and if you haven't disabled it
with a build-time knob, they should be shown in bold. Curiously in my
build of the docs this doesn't quite work, and I get "git h" in bold,
but "ämta" unbolded. Weird, and presumably a bug in asciidoc (the
generated docbook xml shows the same thing, as does html). I get the
same with asciidoctor, too. I'm not sure if it is worth working around
it (or even how we would do so).

> diff --git a/alias.c b/alias.c
> index 1a1a141a0a..f0c5f12fdd 100644
> --- a/alias.c
> +++ b/alias.c
> @@ -17,19 +17,45 @@ static int config_alias_cb(const char *key, const char *value,
>  			   const struct config_context *ctx UNUSED, void *d)
>  {
>  	struct config_alias_data *data = d;
> -	const char *p;
> +	const char *cmd, *subkey;
> +	size_t cmd_len;
> +	int is_subsection;
>  
> -	if (!skip_prefix(key, "alias.", &p))
> +	/* Use parse_config_key() to handle both 2-level and 3-level keys */
> +	if (parse_config_key(key, "alias", &cmd, &cmd_len, &subkey) < 0)
>  		return 0;
>  
> +	/*
> +	 * Support two syntaxes:
> +	 * 1. alias.name = value (simple, 2-level key)
> +	 * 2. [alias "name"] command = value (new, 3-level key)
> +	 */
> +	if (cmd) {
> +		if (strcmp(subkey, "command"))
> +			return 0;
> +		is_subsection = 1;
> +	} else {
> +		cmd = subkey;
> +		cmd_len = strlen(cmd);
> +		is_subsection = 0;
> +	}

OK, so versus what I showed earlier, we are keeping an is_subsection
flag in order to differentiate the cases. And then we use that below...

>  	if (data->alias) {
> -		if (!strcasecmp(p, data->alias)) {
> +		int match;
> +		if (is_subsection) {
> +			match = (strlen(data->alias) == cmd_len &&
> +				 !strncmp(data->alias, cmd, cmd_len));
> +		} else {
> +			match = (strlen(data->alias) == cmd_len &&
> +				 !strncasecmp(data->alias, cmd, cmd_len));
> +		}

...to decide whether to be case-insensitive or not. Makes sense.

I was going to suggest some simplifications here: if is_subsection is
false, then we know that cmd is a NUL-terminated string, and we could
just use strcasecmp() as before. And in fact, with that conditional we
do not need the "cmd" aliasing at all. Which also means we don't need a
separate flag to distinguish the cases.

You could just write:

  if (cmd) {
	match = (strlen(data->alias) == cmd_len &&
	         !strncmp(data->alias, cmd, cmd_len));
  } else {
	match = !strcasecmp(data->alias, subkey);
  }

But I guess we do use the redirection in the listing path later anyway:

>  	} else if (data->list) {
> -		string_list_append(data->list, p);
> +		string_list_append_nodup(data->list, xmemdupz(cmd, cmd_len));
>  	}

So maybe it is better as-is. I dunno. I have a feeling this would all
need even more refactoring even if we ever added any more alias.foo.*
config entries. So we can probably just leave it until then for more
rearranging.


One thing I did find a little funny is the use of the word "subkey" for
one variable. Usually we just call this the "key", but that name is
annoyingly already taken by the function parameter (well before your
patch). Usually we call that "var" instead. I don't know if it's worth
changing or not. It's probably only confusing to me because I've stared
at so many other Git config callback functions.

> diff --git a/t/t0014-alias-utf8.sh b/t/t0014-alias-utf8.sh

Is there any reason to add a new script here, and not just put these in
the existing t0014? If you were bailing from the script when the prereq
was not met, we'd need a separate script. But since you are using
per-test prereqs, they can just exist alongside the other alias tests.

And also, you can't have two scripts with the same number. ;) Running
"make test" should complain about that (I'll assume you ran your tests,
but manually as ./t0014).

> +# Skip if filesystem/locale doesn't support UTF-8
> +test_lazy_prereq UTF8_LOCALE '
> +	test_have_prereq !MINGW &&
> +	test_set_prereq UTF8_LOCALE
> +'

This is a slight mis-use of set_prereq. You should just need to return
success from the lazy-prereq function, and it will be set automatically.
So more like:

  test_lazy_prereq UTF8_LOCALE '
	test_have_prereq !MINGW
  '

At which point I wonder if there is much value in having UTF8_LOCALE at
all, and not just putting "!MINGW" in each of the tests.

Beyond that, my larger question is: what are we asking of the platform
and why does MINGW not work? We do not care about actual utf8 filesystem
support, since these are just aliases. As long as the config file can
store them and we can pass them on the command-line, that should be
enough.

> +test_expect_success UTF8_LOCALE 'setup UTF-8 aliases' '
> +	git config alias."förgrena".command branch &&
> +	git config alias."分支".command "branch --list" &&
> +	git config alias."test name".command status
> +'

Usually we'd just define these in the tests that use them, which keeps
each individual test a bit more self-contained. So more like:

  test_expect_success 'UTF-8 alias with Swedish characters' '
	test_config alias."förgrena".command branch &&
	git förgrena >output &&
	...
  '

But we use them in the listing test, too, so they are each used twice. I
dunno. It might make sense for the listing test to just define its own
expected set.

> +test_expect_success UTF8_LOCALE 'UTF-8 alias with Swedish characters' '
> +	git förgrena >output &&
> +	test_grep -E "^(\* )?(main|master)" output
> +'

So I see we're aliasing "branch" here, which gives us this complicated
regex for checking the output. Would something like "!echo ran my alias"
make the test easier to understand?

> +test_expect_success 'list UTF-8 aliases' '
> +	git config --get-regexp "^alias\\..*\\.command" >output &&
> +	test_line_count -ge 3 output
> +'

This one needs the UTF8_LOCALE prereq, too (since we'd only have set up
those aliases if we had it).

Is doing --get-regexp here all that interesting? It's not testing your
new code at all, and would have passed already. I expected us to look at
the output of "git help -a" to see that the entries are listed.

And trying that with your patch:

  ./git -c alias.föo.command=branch help -a

shows some possible issues:

  - we show it as "föo.command"

  - the alignment with other aliases is not quite right. Probably it is
    using a byte-count instead of utf8_strwidth()

I was puzzled that we would not parse it as föo correctly, since your
patch modified the listing code. It looks like help.c has its own
separate parser for this. Yuck. :( I think this also means that "git
help föo" will fail.

So probably we want a preparatory patch to teach help.c to rely on
list_aliases() and alias_lookup() from alias.[ch].

> +test_expect_success 'flat syntax still works' '
> +	git config alias.testlegacy status &&
> +	git testlegacy >output &&
> +	test_grep "On branch" output
> +'

This probably is covered elsewhere already, but OK. :)

> +test_expect_success 'new subsection syntax works' '
> +	git config alias.testnew.command status &&
> +	git testnew >output &&
> +	test_grep "On branch" output
> +'

Nice basic test without utf8. Good.

> +test_expect_success 'subsection syntax only accepts command key' '
> +	git config alias.invalid.notcommand "value" &&
> +	test_must_fail git invalid 2>error &&
> +	test_grep -i "not a git command" error
> +'

Good thinking.

> +test_expect_success 'simple syntax is case-insensitive' '
> +	git config alias.LegacyCase status &&
> +	git legacycase >output 2>&1 &&
> +	test_grep "On branch" output
> +'
> +
> +test_expect_success 'subsection syntax is case-sensitive' '
> +	test_commit case-test &&
> +	git config alias.SubCase.command "log --oneline" &&
> +	git config alias.subcase.command status &&
> +	git SubCase >upper.out 2>&1 &&
> +	git subcase >lower.out 2>&1 &&
> +	! test_cmp upper.out lower.out
> +'

And these are both very nice to see.

-Peff

  reply	other threads:[~2026-02-10  7:44 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-08 15:30 [RFC] Support UTF-8 characters in Git alias names Jonatan Holmgren
2026-02-08 16:07 ` D. Ben Knoble
2026-02-08 23:21 ` brian m. carlson
2026-02-09 14:55   ` Junio C Hamano
2026-02-09 15:19     ` Jonatan Holmgren
2026-02-09 17:59       ` Junio C Hamano
2026-02-09 22:40     ` brian m. carlson
2026-02-09 23:14       ` Junio C Hamano
2026-02-10  0:45         ` Ben Knoble
2026-02-10  1:04           ` Junio C Hamano
2026-02-10  6:59             ` Jeff King
2026-02-09  7:36 ` Jeff King
2026-02-09 13:59   ` Theodore Tso
2026-02-09 22:01 ` [PATCH v1] alias: support UTF-8 characters via subsection syntax Jonatan Holmgren
2026-02-10  7:44   ` Jeff King [this message]
2026-02-10  8:30   ` Torsten Bögershausen
2026-02-10 16:35   ` Junio C Hamano
2026-02-10 18:31 ` [PATCH v2 0/2] support UTF-8 in alias names Jonatan Holmgren
2026-02-10 18:31   ` [PATCH v2 1/2] help: use list_aliases() for alias listing and lookup Jonatan Holmgren
2026-02-10 19:27     ` Junio C Hamano
2026-02-10 18:31   ` [PATCH v2 2/2] alias: support non-alphanumeric names via subsection syntax Jonatan Holmgren
2026-02-10 19:47     ` Junio C Hamano
2026-02-10 22:29       ` Jonatan Holmgren
2026-02-23  9:29     ` Kristoffer Haugsbakk
2026-02-23 16:07       ` Kristoffer Haugsbakk
2026-02-23 20:22         ` Junio C Hamano
2026-02-23 20:25           ` Kristoffer Haugsbakk
2026-02-24 10:27     ` Patrick Steinhardt
2026-02-10 22:27 ` [PATCH 0/3] support UTF-8 in alias names Jonatan Holmgren
2026-02-10 22:27   ` [PATCH 1/3] help: use list_aliases() for alias listing Jonatan Holmgren
2026-02-10 23:17     ` Junio C Hamano
2026-02-10 22:27   ` [PATCH 2/3] alias: prepare for subsection aliases Jonatan Holmgren
2026-02-10 22:27   ` [PATCH 3/3] alias: support non-alphanumeric names via subsection syntax Jonatan Holmgren
2026-02-11 21:18 ` [PATCH v4 0/3] support UTF-8 in alias names Jonatan Holmgren
2026-02-11 21:18   ` [PATCH v4 1/3] help: use list_aliases() for alias listing Jonatan Holmgren
2026-02-11 22:29     ` Junio C Hamano
2026-02-11 21:18   ` [PATCH v4 2/3] alias: prepare for subsection aliases Jonatan Holmgren
2026-02-11 21:53     ` Junio C Hamano
2026-02-11 21:18   ` [PATCH v4 3/3] alias: support non-alphanumeric names via subsection syntax Jonatan Holmgren
2026-02-11 22:28     ` Junio C Hamano
2026-02-12 11:16     ` Richard Kerry
2026-02-12 15:34       ` Jonatan Holmgren
2026-02-12 18:52     ` Jonatan Holmgren
2026-02-12 10:27   ` [PATCH v4 0/3] support UTF-8 in alias names Torsten Bögershausen
2026-02-12 15:35     ` Jonatan Holmgren
2026-02-16 16:15 ` [PATCH v5 0/4] support uTF-8 " Jonatan Holmgren
2026-02-16 16:15   ` [PATCH v5 1/4] help: use list_aliases() for alias listing Jonatan Holmgren
2026-02-16 16:15   ` [PATCH v5 2/4] alias: prepare for subsection aliases Jonatan Holmgren
2026-02-16 16:15   ` [PATCH v5 3/4] alias: support non-alphanumeric names via subsection syntax Jonatan Holmgren
2026-02-16 16:15   ` [PATCH v5 4/4] completion: fix zsh alias listing for subsection aliases Jonatan Holmgren
2026-02-16 18:32     ` D. Ben Knoble
2026-02-17 20:01     ` Junio C Hamano
2026-02-18 14:52 ` [PATCH v6 0/4] support UTF-8 in alias names Jonatan Holmgren
2026-02-18 14:52   ` [PATCH v6 1/4] help: use list_aliases() for alias listing Jonatan Holmgren
2026-02-18 14:52   ` [PATCH v6 2/4] alias: prepare for subsection aliases Jonatan Holmgren
2026-02-18 16:21     ` Kristoffer Haugsbakk
2026-02-18 14:52   ` [PATCH v6 3/4] alias: support non-alphanumeric names via subsection syntax Jonatan Holmgren
2026-02-18 14:52   ` [PATCH v6 4/4] completion: fix zsh alias listing for subsection aliases Jonatan Holmgren
2026-02-18 21:57 ` [PATCH v7 0/4] support UTF-8 in alias names Jonatan Holmgren
2026-02-18 21:57   ` [PATCH v7 1/4] help: use list_aliases() for alias listing Jonatan Holmgren
2026-02-24 22:19     ` Jacob Keller
2026-02-24 22:41       ` Junio C Hamano
2026-02-25 20:45         ` Junio C Hamano
2026-02-26 23:33           ` Jacob Keller
2026-02-24 22:21     ` Jacob Keller
2026-02-18 21:57   ` [PATCH v7 2/4] alias: prepare for subsection aliases Jonatan Holmgren
2026-02-18 21:57   ` [PATCH v7 3/4] alias: support non-alphanumeric names via subsection syntax Jonatan Holmgren
2026-02-24 10:55     ` Kristoffer Haugsbakk
2026-02-24 14:48       ` Jonatan Holmgren
2026-02-24 23:23         ` Kristoffer Haugsbakk
2026-02-18 21:57   ` [PATCH v7 4/4] completion: fix zsh alias listing for subsection aliases Jonatan Holmgren
2026-02-19 18:17   ` [PATCH v7 0/4] support UTF-8 in alias names Junio C Hamano
2026-02-19 18:54     ` Jonatan Holmgren
2026-02-24 17:12 ` [PATCH 0/2] Fix small issues in alias subsection handling Jonatan Holmgren
2026-02-24 17:12   ` [PATCH 1/2] doc: fix list continuation in alias subsection example Jonatan Holmgren
2026-02-24 19:11     ` Junio C Hamano
2026-02-24 19:14       ` Kristoffer Haugsbakk
2026-02-24 20:23         ` Junio C Hamano
2026-02-24 17:12   ` [PATCH 2/2] alias: treat empty subsection [alias ""] as plain [alias] Jonatan Holmgren
2026-02-26 17:00   ` [PATCH 0/2] Fix small issues in alias subsection handling Junio C Hamano
2026-02-26 20:53 ` [PATCH v2 0/3] " Jonatan Holmgren
2026-02-26 20:53   ` [PATCH v2 1/3] doc: fix list continuation in alias subsection example Jonatan Holmgren
2026-03-03  9:41     ` Kristoffer Haugsbakk
2026-03-03 15:13       ` [PATCH v2 1/3] doc: fix list continuation in alias subsection example! Jonatan Holmgren
2026-02-26 20:53   ` [PATCH v2 2/3] alias: treat empty subsection [alias ""] as plain [alias] Jonatan Holmgren
2026-02-26 20:53   ` [PATCH v2 3/3] git, help: fix memory leaks in alias listing Jonatan Holmgren
2026-02-26 21:08   ` [PATCH v2 0/3] Fix small issues in alias subsection handling Junio C Hamano
2026-03-03 15:12 ` [PATCH] doc: fix list continuation in alias.adoc Jonatan Holmgren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260210074441.GE1756549@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=benknoble@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=jonatan@jontes$(echo .)page \
    --cc=sandals@crustytoothpaste$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox