From: "brian m. carlson" <sandals@crustytoothpaste•net>
To: Collin Funk <collin.funk1@gmail•com>
Cc: Matthieu Beauchamp <matthieu.beauchamp.boulay@gmail•com>,
Matthieu Beauchamp-Boulay via GitGitGadget
<gitgitgadget@gmail•com>,
git@vger•kernel.org, Matheus Tavares <matheus.tavb@gmail•com>,
Johannes Schindelin <johannes.schindelin@gmx•de>
Subject: Re: [PATCH] ignores: handle non UTF-8 exclude files
Date: Wed, 7 Jan 2026 23:38:53 +0000 [thread overview]
Message-ID: <aV7ujZ2FeO7EleT5@fruit.crustytoothpaste.net> (raw)
In-Reply-To: <87secimchc.fsf@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2139 bytes --]
On 2026-01-07 at 01:35:11, Collin Funk wrote:
> An unfortunate trend that I have seen with Rust programs is that they
> completely disregard the systems locale. E.g. using
> LC_ALL=en_US.ISO-8859-1 and passing an "À" character as an option will
> typically fail since it is encoded as 0xC0 which is not a valid UTF-8
> character.
Git does not usually directly read input and then convert it to other
encodings unless specifically asked to (e.g., `working-tree-encoding`),
so I fully expect that nothing will change there. However, in many
cases, Git also currently does not honour LC_ALL, such as for commit
messages.
> I figured it was worth bringing up since Git may wany to think about it
> some before introducing more Rust. I think it can be worked around by
> using OsString [1], but I guess many people choose not to.
The people who have been working on Rust have been very careful to not
make assumptions that all data is UTF-8, and I don't expect that to
change.
OsString is slightly problematic because it is effectively UTF-8-ish (on
Windows, it's actually WTF-8 and on Unix it allows arbitrary bytes) but
there is no portable way to get any consistent byte encoding out of it.
(In versions of Rust too new for us to use, there is a function that
provides a byte encoding but it's not guaranteed to be stable across
versions.) I have some custom code in one of my branches to handle the
conversion to and from OsString to a consistent byte encoding using some
traits to paper over the operating system differences.
In general, I expect we will continue to use some C-based interfaces
(possibly called via Rust wrappers) because Rust also does not expose
things like file descriptors on Windows or the full range of stat or
other information we need.
One assumption I do think is safe to make is that arbitrary Unicode can
be printed to the terminal, such as in error messages. Considering that
virtually everybody sets IUTF8 in Unix terminals and we effectively do
that right now with localized text, I think that's okay.
--
brian m. carlson (they/them)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]
next prev parent reply other threads:[~2026-01-07 23:39 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-03 22:16 [PATCH] ignores: handle non UTF-8 exclude files Matthieu Beauchamp-Boulay via GitGitGadget
2026-01-04 2:54 ` Junio C Hamano
2026-01-06 19:52 ` Matthieu Beauchamp
2026-01-04 17:35 ` Torsten Bögershausen
2026-01-06 20:32 ` Matthieu Beauchamp
2026-01-07 14:36 ` Phillip Wood
2026-01-04 19:40 ` brian m. carlson
2026-01-06 20:45 ` Matthieu Beauchamp
2026-01-06 23:22 ` brian m. carlson
2026-01-07 1:35 ` Collin Funk
2026-01-07 14:28 ` Phillip Wood
2026-01-07 23:38 ` brian m. carlson [this message]
2026-01-08 1:13 ` Collin Funk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aV7ujZ2FeO7EleT5@fruit.crustytoothpaste.net \
--to=sandals@crustytoothpaste$(echo .)net \
--cc=collin.funk1@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitgitgadget@gmail$(echo .)com \
--cc=johannes.schindelin@gmx$(echo .)de \
--cc=matheus.tavb@gmail$(echo .)com \
--cc=matthieu.beauchamp.boulay@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox