On 2026-02-09 at 14:55:51, Junio C Hamano wrote: > "brian m. carlson" writes: > > > I don't think we have any Unicode normalization code at all in Git, > > though, so if you want a quality implementation, that may be a thing we > > need. > > Isn't NKC/NKD a macOS-only issue in practice? Anything on the > command line "git" potty and "git-blah" built-in commands receive > goes through precompose_argv_prefix() to be normalized on that > platform. Normalization is not a macOS-only issue. Many accented characters can be written in multiple ways, one composed and one decomposed. If the alias in the file is composed and what's on the command line is decomposed, they will not match bytewise even though they are logically and graphically identical. For instance, here is the word for "where" in French, first composed, then decomposed: où où The former is U+006F U+00F9 and the latter is U+006F U+0075 U+0300. Obviously, if I write one of those in my config file and the other on the command line, I intended to execute the same alias, but they are not bytewise identical unless both are normalized identically. This is why many websites don't accept Unicode in passwords: because logging in on different systems can produce different sequences and they must be properly normalized to avoid hard-to-reproduce problems. There are also canonical (NFC and NFD) and compatibility (NFKC and NFKD) normalizations. For instance, a Greek question mark looks like an English semicolon. Canonical normalizations preserve this distinction, but compatibility ones do not. I'll note that the Mac-native normalizations do not match any standard Unicode normalizations for any version, so we'd need separate normalization code. I also don't think UTF-8-MAC is available on all versions of libiconv, either. -- brian m. carlson (they/them) Toronto, Ontario, CA