From: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail•com>
To: git@vger•kernel.org
Cc: ben.knoble@gmail•com, gitster@pobox•com
Subject: Re: [RFC] send-email: UTF-8 encoding in subject line
Date: Sun, 22 Feb 2026 21:22:01 +0530 [thread overview]
Message-ID: <20260222155559.1777883-1-shreyanshpaliwalcmsmn@gmail.com> (raw)
In-Reply-To: <CALnO6CBhB+O-CBCw3f+2n5yaHO7Wk7-Adaa9_4shXZvciGpUPA@mail.gmail.com>
> On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
> <shreyanshpaliwalcmsmn@gmail•com> wrote:
> >
> > > > That makes sense, I tried it below.
> > > > I also wondered whether, in addition to this, it might be helpful to warn on
> > > > an invalid charset, and/or possibly fall back to UTF-8.
> > >
> > > Agreed on the first half of the statement, if we have an easy and
> > > portable way to tell if a given random string names a valid charset.
> > > I do not recommend to "fall back" to anything, if we are asking an
> > > input from the user.
> >
> > Following up on this, I tried adding a warning when the provided charset
> > does not appear to be valid. Current flow is,
> >
> > Which 8bit encoding should I declare [UTF-8]? y
> > Are you sure you want to use <y> [y/N]? y
> >
> > With the additional check, it becomes,
> >
> > Which 8bit encoding should I declare [default: UTF-8]? y
> > warning: 'y' does not appear to be a valid charset name.
> > Are you sure you want to use <y> [y/N]?
> >
> > This uses find_encoding() from Perl’s Encode module to detect any
> > unrecognized charset names.
> >
> > Let me know what you think.
> > Also, is there any new test that should be added for this change?
> >
> > Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail•com>
> > ---
> > git-send-email.perl | 23 ++++++++++++++++++++---
> > 1 file changed, 20 insertions(+), 3 deletions(-)
> >
> > diff --git a/git-send-email.perl b/git-send-email.perl
> > index cd4b316ddc..e62fa259ba 100755
> > --- a/git-send-email.perl
> > +++ b/git-send-email.perl
> > @@ -23,6 +23,7 @@
> > use Git::LoadCPAN::Error qw(:try);
> > use Git;
> > use Git::I18N;
> > +use Encode qw(find_encoding);
> >
> > Getopt::Long::Configure qw/ pass_through /;
> >
> > @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
> > foreach my $f (sort keys %broken_encoding) {
> > print " $f\n";
> > }
> > - $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> > - valid_re => qr/.{4}/, confirm_only => 1,
> > - default => "UTF-8");
> > + while (1) {
> > + my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
> > + valid_re => qr/^\S+$/,
> > + default => "UTF-8");
>
> Here we change things, right?
>
> - The original validation is "at least 4 characters", the new
> validation is "at least one non-blank." I'm not sure why we'd prefer
> one or the other, frankly. The original goes to 852a15d748
> (send-email: ask confirmation if given encoding name is very short,
> 2015-02-13), which is motivated by the same problem we're discussing
> here!
I see.
My understanding of the earlier change (852a15d748) is that the
length check was intended as a heuristic check to catch obviously invalid
inputs like "y" and trigger an extra confirmation based on the fact that
charset names would be at least 4 letters.
With the additional find_encoding() check, the validation becomes semantic
rather than length-based, recognized charset names are accepted directly,
while unrecognized ones trigger a warning and still require explicit
confirmation. The relaxed regex (at least one non-blank) is only meant to
ensure we receive some non-empty input before passing it to find_encoding().
> - We get rid of confirm_only, since we're about to roll our own
> confirmation below:
>
> > + next unless defined $encoding;
> > + if (find_encoding($encoding)) {
> > + $auto_8bit_encoding = $encoding;
> > + last;
> > + }
> > + printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> > + my $yesno = ask(
> > + sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
> > + valid_re => qr/^(?:y|n)/i,
> > + default => 'n');
>
> …which might want refactored a bit so it can stay close to the original? idk.
>
Actually the flow needed to change slightly to insert the validity warning
before the final confirmation step. Since ask() handles confirmation internally
using confrim_only and is used in multiple places, it seemed simpler to keep the
additional confirmation local here rather than modifying ask() itself.
Let me know what you think.
Best,
Shreyansh
next prev parent reply other threads:[~2026-02-22 15:56 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 14:50 [RFC] send-email: UTF-8 encoding in subject line Shreyansh Paliwal
2026-02-21 2:28 ` Ben Knoble
2026-02-21 13:38 ` Shreyansh Paliwal
2026-02-21 17:30 ` Junio C Hamano
2026-02-22 14:03 ` Shreyansh Paliwal
2026-02-22 14:53 ` Philip Oakley
2026-02-22 15:00 ` D. Ben Knoble
2026-02-22 15:52 ` Shreyansh Paliwal [this message]
2026-02-23 21:38 ` Ben Knoble
2026-02-24 7:55 ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
2026-02-22 14:53 ` [RFC] send-email: UTF-8 encoding in subject line D. Ben Knoble
2026-02-24 14:33 ` [PATCH] send-email: validate charset name in 8bit encoding prompt Shreyansh Paliwal
2026-02-24 21:11 ` Junio C Hamano
2026-02-24 21:37 ` [PATCH v2] " Shreyansh Paliwal
2026-02-24 22:06 ` Junio C Hamano
2026-02-24 22:20 ` Shreyansh Paliwal
2026-02-25 16:37 ` D. Ben Knoble
2026-02-26 17:32 ` Shreyansh Paliwal
2026-02-26 16:16 ` [PATCH v3] " Shreyansh Paliwal
2026-02-26 18:45 ` Junio C Hamano
2026-02-26 19:06 ` Junio C Hamano
2026-02-28 8:41 ` Shreyansh Paliwal
2026-02-28 8:36 ` Shreyansh Paliwal
2026-02-28 11:20 ` [PATCH v4] " Shreyansh Paliwal
2026-02-28 21:16 ` D. Ben Knoble
2026-03-02 16:10 ` Junio C Hamano
2026-03-03 19:06 ` Shreyansh Paliwal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260222155559.1777883-1-shreyanshpaliwalcmsmn@gmail.com \
--to=shreyanshpaliwalcmsmn@gmail$(echo .)com \
--cc=ben.knoble@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox