* =?y?q?=5BGSOC=5D=20Discuss=3A=20Refactoring=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?=
@ 2026-02-19 18:02 Shreyansh Paliwal
2026-02-19 18:17 ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Shreyansh Paliwal @ 2026-02-19 18:02 UTC (permalink / raw)
To: git
Cc: gitster, christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31, lucasseikioshiro
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 2474 bytes --]
Hi everyone,
I have been around Git for some time and am interested in the “Refactoring
in order to reduce Git’s global state” project for GSoC 2026.
So far I have built Git from source, completed a microproject, and explored
some related areas in worktree and wt-status. I have also gone through the
blog posts by Ayush and Bello Olamide, which were very helpful in getting
to know about the ongoing/previous related to this. From what I gathered,
- In Outreachy, recent work has focused on moving core.attributesfile and
core.sparseCheckout into local structs and also to handle the issue of
lazy loading, but it is still a work in progress.
- In last year’s GSoC work, the focus included removing uses of
the_repository and other globals across areas such as
preload-index:(core_preload_index), builtin/prune:
(repository_format_precious_objects), builtin/fmt-merge-msg:
(merge_log_config).
Though I still have a few questions regarding the project for better clarity,
- Should the primary focus be on core library code rather than builtin?
(ref. [1])
- Is it preferable to approach the project file-wise (eg. cleanup of one
file making it completely free of the_repository) or variable-wise (eg.
identify one global state from environment.c and eliminate across the
codebase)?
- Are there any globals which are best not to be removed currently?
For example, in editor.c there are mainly two globals,
- editor_program, which appears to be only used within the file and is not
dependant on repository. So would it be preferable to remove it from
environment.c and localize it within editor.c, move it into struct
repository_settings / repo_config_values, or keep it as is?
- the_repository, there is only one instance in the function
git_sequence_editor() which is used in editor.c which can be modified to
pass struct repository down the callers but is also used in
builtin/var.c, where a local repository instance is not available. In
that case, would it be feasible to pass the_repository or is there any
other way?
I have also surveyed files that use #define USE_THE_REPOSITORY_VARIABLE to
roughly analyse the usage of globals, and I could make that much of the
library code is still dependant on the_repository, so could that be taken
on priority to reduce the usage of the_repository throughout the codebase.
Thanks,
Shreyansh
[1]- https://lore.kernel.org/git/7b5dd0c4-0ca0-458e-89db-621a70dac9ae@gmail.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC] Discuss: Refactoring in order to reduce global state
2026-02-19 18:02 =?y?q?=5BGSOC=5D=20Discuss=3A=20Refactoring=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Shreyansh Paliwal
@ 2026-02-19 18:17 ` Shreyansh Paliwal
2026-02-23 8:26 ` Shreyansh Paliwal
2026-02-24 13:40 ` [GSOC] Discuss: Refactoring in order to reduce Git’s " Karthik Nayak
2 siblings, 0 replies; 7+ messages in thread
From: Shreyansh Paliwal @ 2026-02-19 18:17 UTC (permalink / raw)
To: git
Cc: gitster, christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31, lucasseikioshiro
I think there was an issue with git send-email that is why the subject
line seems malformed. Please ignore that.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC] Discuss: Refactoring in order to reduce global state
2026-02-19 18:02 =?y?q?=5BGSOC=5D=20Discuss=3A=20Refactoring=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Shreyansh Paliwal
2026-02-19 18:17 ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
@ 2026-02-23 8:26 ` Shreyansh Paliwal
2026-02-24 13:40 ` [GSOC] Discuss: Refactoring in order to reduce Git’s " Karthik Nayak
2 siblings, 0 replies; 7+ messages in thread
From: Shreyansh Paliwal @ 2026-02-23 8:26 UTC (permalink / raw)
To: git
Cc: gitster, christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31, lucasseikioshiro
ping.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC] Discuss: Refactoring in order to reduce global state
2026-02-23 21:38 [RFC] send-email: UTF-8 encoding in subject line Ben Knoble
@ 2026-02-24 7:55 ` Shreyansh Paliwal
0 siblings, 0 replies; 7+ messages in thread
From: Shreyansh Paliwal @ 2026-02-24 7:55 UTC (permalink / raw)
To: git; +Cc: ben.knoble, gitster
> > Le 22 févr. 2026 à 10:56, Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail•com> a écrit :
> >
> >
> >>
> >>> On Sun, Feb 22, 2026 at 9:07 AM Shreyansh Paliwal
> >>> <shreyanshpaliwalcmsmn@gmail•com> wrote:
> >>>
> >>>>> That makes sense, I tried it below.
> >>>>> I also wondered whether, in addition to this, it might be helpful to warn on
> >>>>> an invalid charset, and/or possibly fall back to UTF-8.
> >>>>
> >>>> Agreed on the first half of the statement, if we have an easy and
> >>>> portable way to tell if a given random string names a valid charset.
> >>>> I do not recommend to "fall back" to anything, if we are asking an
> >>>> input from the user.
> >>>
> >>> Following up on this, I tried adding a warning when the provided charset
> >>> does not appear to be valid. Current flow is,
> >>>
> >>> Which 8bit encoding should I declare [UTF-8]? y
> >>> Are you sure you want to use <y> [y/N]? y
> >>>
> >>> With the additional check, it becomes,
> >>>
> >>> Which 8bit encoding should I declare [default: UTF-8]? y
> >>> warning: 'y' does not appear to be a valid charset name.
> >>> Are you sure you want to use <y> [y/N]?
> >>>
> >>> This uses find_encoding() from Perl’s Encode module to detect any
> >>> unrecognized charset names.
> >>>
> >>> Let me know what you think.
> >>> Also, is there any new test that should be added for this change?
> >>>
> >>> Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail•com>
> >>> ---
> >>> git-send-email.perl | 23 ++++++++++++++++++++---
> >>> 1 file changed, 20 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/git-send-email.perl b/git-send-email.perl
> >>> index cd4b316ddc..e62fa259ba 100755
> >>> --- a/git-send-email.perl
> >>> +++ b/git-send-email.perl
> >>> @@ -23,6 +23,7 @@
> >>> use Git::LoadCPAN::Error qw(:try);
> >>> use Git;
> >>> use Git::I18N;
> >>> +use Encode qw(find_encoding);
> >>>
> >>> Getopt::Long::Configure qw/ pass_through /;
> >>>
> >>> @@ -1044,9 +1045,25 @@ sub file_declares_8bit_cte {
> >>> foreach my $f (sort keys %broken_encoding) {
> >>> print " $f\n";
> >>> }
> >>> - $auto_8bit_encoding = ask(__("Which 8bit encoding should I declare [UTF-8]? "),
> >>> - valid_re => qr/.{4}/, confirm_only => 1,
> >>> - default => "UTF-8");
> >>> + while (1) {
> >>> + my $encoding = ask(__("Which 8bit encoding should I declare [default: UTF-8]? "),
> >>> + valid_re => qr/^\S+$/,
> >>> + default => "UTF-8");
> >>
> >> Here we change things, right?
> >>
> >> - The original validation is "at least 4 characters", the new
> >> validation is "at least one non-blank." I'm not sure why we'd prefer
> >> one or the other, frankly. The original goes to 852a15d748
> >> (send-email: ask confirmation if given encoding name is very short,
> >> 2015-02-13), which is motivated by the same problem we're discussing
> >> here!
> >
> > I see.
> > My understanding of the earlier change (852a15d748) is that the
> > length check was intended as a heuristic check to catch obviously invalid
> > inputs like "y" and trigger an extra confirmation based on the fact that
> > charset names would be at least 4 letters.
> >
> > With the additional find_encoding() check, the validation becomes semantic
> > rather than length-based, recognized charset names are accepted directly,
> > while unrecognized ones trigger a warning and still require explicit
> > confirmation. The relaxed regex (at least one non-blank) is only meant to
> > ensure we receive some non-empty input before passing it to find_encoding().
> >
> >> - We get rid of confirm_only, since we're about to roll our own
> >> confirmation below:
> >>
> >>> + next unless defined $encoding;
> >>> + if (find_encoding($encoding)) {
> >>> + $auto_8bit_encoding = $encoding;
> >>> + last;
> >>> + }
> >>> + printf STDERR __("warning: '%s' does not appear to be a valid charset name.\n"), $encoding;
> >>> + my $yesno = ask(
> >>> + sprintf(__("Are you sure you want to use <%s> [y/N]? "), $encoding),
> >>> + valid_re => qr/^(?:y|n)/i,
> >>> + default => 'n');
> >>
> >> …which might want refactored a bit so it can stay close to the original? idk.
> >>
> >
> > Actually the flow needed to change slightly to insert the validity warning
> > before the final confirmation step. Since ask() handles confirmation internally
> > using confrim_only and is used in multiple places, it seemed simpler to keep the
> > additional confirmation local here rather than modifying ask() itself.
> >
> > Let me know what you think.
> >
> > Best,
> > Shreyansh
>
> Ah, my mistake for being ambiguous. I meant:
>
> The code is similar enough to the original that perhaps a helper can be
> introduced, or at least we should keep the equivalent strings together to
> help those who change one.
Thanks for clarifying, that makes sense.
I'll refactor and send a revised patch on this.
Best,
Shreyansh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC] Discuss: Refactoring in order to reduce Git’s global state
2026-02-19 18:02 =?y?q?=5BGSOC=5D=20Discuss=3A=20Refactoring=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Shreyansh Paliwal
2026-02-19 18:17 ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
2026-02-23 8:26 ` Shreyansh Paliwal
@ 2026-02-24 13:40 ` Karthik Nayak
2026-02-24 16:15 ` Shreyansh Paliwal
2 siblings, 1 reply; 7+ messages in thread
From: Karthik Nayak @ 2026-02-24 13:40 UTC (permalink / raw)
To: Shreyansh Paliwal, git
Cc: gitster, christian.couder, jltobler, ayu.chandekar,
siddharthasthana31, lucasseikioshiro
[-- Attachment #1: Type: text/plain, Size: 3682 bytes --]
Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail•com> writes:
> Hi everyone,
>
> I have been around Git for some time and am interested in the “Refactoring
> in order to reduce Git’s global state” project for GSoC 2026.
>
> So far I have built Git from source, completed a microproject, and explored
> some related areas in worktree and wt-status. I have also gone through the
> blog posts by Ayush and Bello Olamide, which were very helpful in getting
> to know about the ongoing/previous related to this. From what I gathered,
>
> - In Outreachy, recent work has focused on moving core.attributesfile and
> core.sparseCheckout into local structs and also to handle the issue of
> lazy loading, but it is still a work in progress.
>
> - In last year’s GSoC work, the focus included removing uses of
> the_repository and other globals across areas such as
> preload-index:(core_preload_index), builtin/prune:
> (repository_format_precious_objects), builtin/fmt-merge-msg:
> (merge_log_config).
>
> Though I still have a few questions regarding the project for better clarity,
>
> - Should the primary focus be on core library code rather than builtin?
> (ref. [1])
>
Phillip does make a good point, replacing global variable usage in the
library code is indeed more useful.
However cleanup of some of the global config variables, could involve
touching the builtin code.
> - Is it preferable to approach the project file-wise (eg. cleanup of one
> file making it completely free of the_repository) or variable-wise (eg.
> identify one global state from environment.c and eliminate across the
> codebase)?
>
Depends, some variables (e.g. the_repository) are spread more broadly so
trying to go variable wise might not make much sense for them.
> - Are there any globals which are best not to be removed currently?
>
> For example, in editor.c there are mainly two globals,
>
> - editor_program, which appears to be only used within the file and is not
> dependant on repository. So would it be preferable to remove it from
> environment.c and localize it within editor.c, move it into struct
> repository_settings / repo_config_values, or keep it as is?
>
Makes sense to localize it within editor.c. What's more important is to
understand that currently `editor_program` is setup inside
`git_default_core_config()`. What would the new flow look like?
Also with a global variable, its parsed once and available till
execution ends. Will that still be the case?
> - the_repository, there is only one instance in the function
> git_sequence_editor() which is used in editor.c which can be modified to
> pass struct repository down the callers but is also used in
> builtin/var.c, where a local repository instance is not available. In
> that case, would it be feasible to pass the_repository or is there any
> other way?
>
Yes, that's how I would tackle it. Moving dependency to upper layers is
a valid way to go about this, we do want to avoid this scenario if the
upper layer is already cleared of such variables and has access to an
alternative. In your case 'builtin/var.c' already uses 'the_repository',
so this should be acceptable.
> I have also surveyed files that use #define USE_THE_REPOSITORY_VARIABLE to
> roughly analyse the usage of globals, and I could make that much of the
> library code is still dependant on the_repository, so could that be taken
> on priority to reduce the usage of the_repository throughout the codebase.
>
> Thanks,
> Shreyansh
>
> [1]- https://lore.kernel.org/git/7b5dd0c4-0ca0-458e-89db-621a70dac9ae@gmail.com/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC] Discuss: Refactoring in order to reduce Git’s global state
2026-02-24 13:40 ` [GSOC] Discuss: Refactoring in order to reduce Git’s " Karthik Nayak
@ 2026-02-24 16:15 ` Shreyansh Paliwal
2026-03-04 14:57 ` Shreyansh Paliwal
0 siblings, 1 reply; 7+ messages in thread
From: Shreyansh Paliwal @ 2026-02-24 16:15 UTC (permalink / raw)
To: git
Cc: gitster, christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31, lucasseikioshiro
> Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail•com> writes:
>
> > Hi everyone,
> >
> > I have been around Git for some time and am interested in the “Refactoring
> > in order to reduce Git’s global state” project for GSoC 2026.
> >
> > So far I have built Git from source, completed a microproject, and explored
> > some related areas in worktree and wt-status. I have also gone through the
> > blog posts by Ayush and Bello Olamide, which were very helpful in getting
> > to know about the ongoing/previous related to this. From what I gathered,
> >
> > - In Outreachy, recent work has focused on moving core.attributesfile and
> > core.sparseCheckout into local structs and also to handle the issue of
> > lazy loading, but it is still a work in progress.
> >
> > - In last year’s GSoC work, the focus included removing uses of
> > the_repository and other globals across areas such as
> > preload-index:(core_preload_index), builtin/prune:
> > (repository_format_precious_objects), builtin/fmt-merge-msg:
> > (merge_log_config).
> >
> > Though I still have a few questions regarding the project for better clarity,
> >
> > - Should the primary focus be on core library code rather than builtin?
> > (ref. [1])
> >
>
> Phillip does make a good point, replacing global variable usage in the
> library code is indeed more useful.
>
> However cleanup of some of the global config variables, could involve
> touching the builtin code.
Right, Got it.
> > - Is it preferable to approach the project file-wise (eg. cleanup of one
> > file making it completely free of the_repository) or variable-wise (eg.
> > identify one global state from environment.c and eliminate across the
> > codebase)?
> >
>
> Depends, some variables (e.g. the_repository) are spread more broadly so
> trying to go variable wise might not make much sense for them.
>
> > - Are there any globals which are best not to be removed currently?
> >
> > For example, in editor.c there are mainly two globals,
> >
> > - editor_program, which appears to be only used within the file and is not
> > dependant on repository. So would it be preferable to remove it from
> > environment.c and localize it within editor.c, move it into struct
> > repository_settings / repo_config_values, or keep it as is?
> >
>
> Makes sense to localize it within editor.c. What's more important is to
> understand that currently `editor_program` is setup inside
> `git_default_core_config()`. What would the new flow look like?
> Also with a global variable, its parsed once and available till
> execution ends. Will that still be the case?
Hmm. I will see how we can localize editor_program while keeping the parsing
and availability like the global. I think Junio also pointed out something
related to lazy loading of global variables in some recent discussion, I will
look into that as well and will follow-up by an rfc patch on this, maybe that
will clear more things out.
> > - the_repository, there is only one instance in the function
> > git_sequence_editor() which is used in editor.c which can be modified to
> > pass struct repository down the callers but is also used in
> > builtin/var.c, where a local repository instance is not available. In
> > that case, would it be feasible to pass the_repository or is there any
> > other way?
> >
>
> Yes, that's how I would tackle it. Moving dependency to upper layers is
> a valid way to go about this, we do want to avoid this scenario if the
> upper layer is already cleared of such variables and has access to an
> alternative. In your case 'builtin/var.c' already uses 'the_repository',
> so this should be acceptable.
Understood. That makes sense.
Thanks for the guidance,
Shreyansh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [GSOC] Discuss: Refactoring in order to reduce Git’s global state
2026-02-24 16:15 ` Shreyansh Paliwal
@ 2026-03-04 14:57 ` Shreyansh Paliwal
0 siblings, 0 replies; 7+ messages in thread
From: Shreyansh Paliwal @ 2026-03-04 14:57 UTC (permalink / raw)
To: git
Cc: gitster, christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31, lucasseikioshiro
Hi Karthik,
Following up on this, I recently sent a patch on `editor_program` [1],
but the discussion hasn’t reached a clear conclusion yet. I would really
appreciate your thoughts and feedback on it including what you think
would be the most appropriate way forward.
Thanks,
Shreyansh
[1]- https://lore.kernel.org/git/20260301105228.1738388-1-shreyanshpaliwalcmsmn@gmail.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-03-04 14:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-19 18:02 =?y?q?=5BGSOC=5D=20Discuss=3A=20Refactoring=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Shreyansh Paliwal
2026-02-19 18:17 ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
2026-02-23 8:26 ` Shreyansh Paliwal
2026-02-24 13:40 ` [GSOC] Discuss: Refactoring in order to reduce Git’s " Karthik Nayak
2026-02-24 16:15 ` Shreyansh Paliwal
2026-03-04 14:57 ` Shreyansh Paliwal
-- strict thread matches above, loose matches on Subject: below --
2026-02-23 21:38 [RFC] send-email: UTF-8 encoding in subject line Ben Knoble
2026-02-24 7:55 ` [GSOC] Discuss: Refactoring in order to reduce global state Shreyansh Paliwal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox