On 2026-02-09 at 14:55:51, Junio C Hamano wrote:
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
> 
> > I don't think we have any Unicode normalization code at all in Git,
> > though, so if you want a quality implementation, that may be a thing we
> > need.
> 
> Isn't NKC/NKD a macOS-only issue in practice?  Anything on the
> command line "git" potty and "git-blah" built-in commands receive
> goes through precompose_argv_prefix() to be normalized on that
> platform.

Normalization is not a macOS-only issue.  Many accented characters can
be written in multiple ways, one composed and one decomposed.  If the
alias in the file is composed and what's on the command line is
decomposed, they will not match bytewise even though they are logically
and graphically identical.

For instance, here is the word for "where" in French, first composed,
then decomposed:

où
où

The former is U+006F U+00F9 and the latter is U+006F U+0075 U+0300.
Obviously, if I write one of those in my config file and the other on
the command line, I intended to execute the same alias, but they are not
bytewise identical unless both are normalized identically.

This is why many websites don't accept Unicode in passwords: because
logging in on different systems can produce different sequences and they
must be properly normalized to avoid hard-to-reproduce problems.

There are also canonical (NFC and NFD) and compatibility (NFKC and NFKD)
normalizations.  For instance, a Greek question mark looks like an
English semicolon.  Canonical normalizations preserve this distinction,
but compatibility ones do not.

I'll note that the Mac-native normalizations do not match any standard
Unicode normalizations for any version, so we'd need separate
normalization code.  I also don't think UTF-8-MAC is available on all
versions of libiconv, either.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA