From: Collin Funk <collin.funk1@gmail•com>
To: "brian m. carlson" <sandals@crustytoothpaste•net>
Cc: Matthieu Beauchamp <matthieu.beauchamp.boulay@gmail•com>,
Matthieu Beauchamp-Boulay via GitGitGadget
<gitgitgadget@gmail•com>,
git@vger•kernel.org, Matheus Tavares <matheus.tavb@gmail•com>,
Johannes Schindelin <johannes.schindelin@gmx•de>
Subject: Re: [PATCH] ignores: handle non UTF-8 exclude files
Date: Wed, 07 Jan 2026 17:13:05 -0800 [thread overview]
Message-ID: <87ldi8aov2.fsf@gmail.com> (raw)
In-Reply-To: <aV7ujZ2FeO7EleT5@fruit.crustytoothpaste.net>
"brian m. carlson" <sandals@crustytoothpaste•net> writes:
> On 2026-01-07 at 01:35:11, Collin Funk wrote:
>> An unfortunate trend that I have seen with Rust programs is that they
>> completely disregard the systems locale. E.g. using
>> LC_ALL=en_US.ISO-8859-1 and passing an "À" character as an option will
>> typically fail since it is encoded as 0xC0 which is not a valid UTF-8
>> character.
>
> Git does not usually directly read input and then convert it to other
> encodings unless specifically asked to (e.g., `working-tree-encoding`),
> so I fully expect that nothing will change there. However, in many
> cases, Git also currently does not honour LC_ALL, such as for commit
> messages.
That makes sense.
>> I figured it was worth bringing up since Git may wany to think about it
>> some before introducing more Rust. I think it can be worked around by
>> using OsString [1], but I guess many people choose not to.
>
> The people who have been working on Rust have been very careful to not
> make assumptions that all data is UTF-8, and I don't expect that to
> change.
Great, glad that it was considered. I guess you have to worry about
crates, but I think I recall wide agreement that Git was going to be
careful with what it decides to use.
> OsString is slightly problematic because it is effectively UTF-8-ish (on
> Windows, it's actually WTF-8 and on Unix it allows arbitrary bytes) but
> there is no portable way to get any consistent byte encoding out of it.
> (In versions of Rust too new for us to use, there is a function that
> provides a byte encoding but it's not guaranteed to be stable across
> versions.) I have some custom code in one of my branches to handle the
> conversion to and from OsString to a consistent byte encoding using some
> traits to paper over the operating system differences.
Interesting, good to know. Thanks.
Unrelated to encoding, but two other things I noticed about Rust. Before
main() SIGPIPE is set to SIG_IGN which can be seen with the programs
below:
$ cat main.rs
use std::io::{self, Write};
fn main() -> io::Result<()> {
io::stdout().write_all(b"hello world\n")?;
Ok(())
}
$ cat main.c
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int
main (void)
{
static const char message[] = "hello world\n";
if (write (STDOUT_FILENO, message, sizeof message - 1) < 0)
{
fprintf (stderr, "%s\n", strerror (errno));
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
$ rustc main.rs
$ gcc main.c
$ ./main | :
Error: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
$ echo ${PIPESTATUS[@]}
1 0
$ ./a.out | :
$ echo ${PIPESTATUS[@]}
141 0
Before executing a program using the standard library, SIGPIPE will be
set to SIG_DFL. That is better than not doing that, but both behaviors
mean that the typical behavior of inheriting signal actions from the
parent process is impossible without hacks or an unstable feature that
has been unfortunately stagnant for years [1].
Before main() all standard file descriptors are also opened. While
reasonable in many cases, is not the desired behavior for all programs.
Using the same example programs:
$ ./main >&-
$ echo $?
0
$ ./a.out >&-
Bad file descriptor
$ echo $?
1
I'm not sure if either of those will affect 'git' at all, assuming it is
mostly library code that is called from C.
But it will likely have to be considered if someone wants to write a
program that goes in libexec that is executed by 'git'.
Collin
[1] https://dev-doc.rust-lang.org/beta/unstable-book/language-features/unix-sigpipe.html
prev parent reply other threads:[~2026-01-08 1:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-03 22:16 [PATCH] ignores: handle non UTF-8 exclude files Matthieu Beauchamp-Boulay via GitGitGadget
2026-01-04 2:54 ` Junio C Hamano
2026-01-06 19:52 ` Matthieu Beauchamp
2026-01-04 17:35 ` Torsten Bögershausen
2026-01-06 20:32 ` Matthieu Beauchamp
2026-01-07 14:36 ` Phillip Wood
2026-01-04 19:40 ` brian m. carlson
2026-01-06 20:45 ` Matthieu Beauchamp
2026-01-06 23:22 ` brian m. carlson
2026-01-07 1:35 ` Collin Funk
2026-01-07 14:28 ` Phillip Wood
2026-01-07 23:38 ` brian m. carlson
2026-01-08 1:13 ` Collin Funk [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ldi8aov2.fsf@gmail.com \
--to=collin.funk1@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitgitgadget@gmail$(echo .)com \
--cc=johannes.schindelin@gmx$(echo .)de \
--cc=matheus.tavb@gmail$(echo .)com \
--cc=matthieu.beauchamp.boulay@gmail$(echo .)com \
--cc=sandals@crustytoothpaste$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox