* [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2025-03-26 5:26 [GSOC] [PROPOSAL V1]: " Ayush Chandekar
@ 2025-04-04 8:51 ` Ayush Chandekar
2025-04-04 14:45 ` Karthik Nayak
2025-04-07 8:42 ` Ayush Chandekar
0 siblings, 2 replies; 15+ messages in thread
From: Ayush Chandekar @ 2025-04-04 8:51 UTC (permalink / raw)
To: ayu.chandekar
Cc: christian.couder, git, karthik.188, ps, shejialuo,
shyamthakkar001
Hello,
This is my GSoC 2025 proposal for the project "Refactoring in order to reduce Git’s global state".
You can view docs version here:
https://docs.google.com/document/d/1tJrtWxo1UGKChB3hu5eZ-ljm0FtU_fsv0TnIRwu3EKY/edit?usp=sharing
---------
Refactoring in order to reduce git’s state
My Information:
---------------
Name: Ayush Chandekar
Email: ayu.chandekar@gmail•com
Mobile No: (+91) 9372496874
Education: UG Sophomore, IIT Roorkee
Github: https://github.com/ayu-ch
Blog: https://ayu-ch.github.io
About me:
---------
I'm Ayush Chandekar, a UG Sophomore studying at Indian Institute of
Technology, Roorkee. I like participating in various software development
and tech-development endeavors, usually hackathons, CTFs, and projects at
SDSLabs. SDSLabs is a student-run technical group that includes passionate
developers and designers interested in various fields and involved in multiple
software development projects that aim to foster a software development
culture on campus. Being a part of this group has exposed me to different
software development methodologies, tools and frameworks and helped me become
comfortable contributing to an open-source project with multiple contributors.
Some open-source contributions I made here are: [1], [2] & [3]
I see this project as a meaningful opportunity to deepen my involvement in
the Git community and to build a foundation for continued contributions to
open source development in the future.
Overview:
---------
Git currently uses a global object called `the_repository`, which refers to a
single instance of `struct repository`. Many internal functions rely on this
global object rather than accepting a `struct repository` as an explicit
parameter. This design inherently assumes a single active repository,
making it difficult to support multi-repository use cases and obstructing
the long-term goal of libification of Git.
A key architectural limitation is that while `struct repository` encapsulates
some repository-specific information, many important environment variables
and configuration settings that logically belong to a repository are still
stored as global variables, primarily in `environment.c`, not within the
`repository` struct. As a result, even if multiple repositories were to
exist concurrently, they would still share this global state, leading to
incorrect behavior, race conditions, or subtle bugs.
This project aims to refactor Git’s environment handling by relocating global
variables into more appropriate local contexts, primarily within
struct repository and struct repo_settings. However, some global variables may
only apply to specific subsystems. In such cases, rather than placing them in
struct repository or struct repo_settings, they should be moved into a
context that better reflects their scope.
This change will not only make the environment state repository-specific but
also improve the modularity and maintainability of the codebase. The work
involves identifying environment-related global variables, determining the
most suitable structure to house them, and updating all affected code paths
accordingly.
The difficulty of this project is medium, and it is estimated to take
175 to 350 hours.
Pre-GSOC:
---------
I started exploring Git’s codebase and documentation around the end of
January, familiarizing myself with its structure and development practices. I
submitted a microproject, which helped me navigate the code and contribution
workflow.
After selecting the project on refactoring Git’s state, I studied the
surrounding code and reviewed past patches ([4], [5], [6], [7], [8] & [9])
to understand the reasoning behind previous changes.
To better prepare for the GSoC timeline, I submitted a patch related to the
project, to gain hands-on experience with both the implementation details
and the submission process. The patch focused on refactoring access to
`core.attributesfile`.
Through discussions and feedback from the community, I gained a clearer
understanding of a key aspect of the project:
determining whether certain variables should belong to repo_settings/
repository or be part of a separate subsystem.
Junio pointed out in a feedback that not all global variables should
be blindly moved into `repo_settings`.
Specifically, for `git_attributes_file`, adding it to the repository struct
doesn’t make sense. He explained that it’s similar to how index_state is
handled, while index_state knows which repository it belongs to, the
repository struct only holds a pointer to a single index_state instance
and isn’t aware of other instances.
Following this approach, instead of placing `git_attributes_file` in the
repository struct, we can house it within an attribute set and pass a
pointer to that set wherever needed.
This practice patch gave me a clearer understanding of the project.
Patches:
--------
For git:
+ (Microproject) t6423: fix suppression of Git’s exit code in tests
Thread:
https://public-inbox.org/git/20250202120926.322417-1-ayu.chandekar@gmail.com/
Status: Merged into master
Commit Hash: 7c1d34fe5d1229362f2c3ecf2d493167a1f555a2
Description: Instead of executing a Git command as the upstream component of
a pipe, which can result in the exit status being lost, redirect
its output to a file and then process that file in two steps to
ensure the exit status is properly preserved.
+ midx: implement progress reporting for QSORT operation
Thread:
https://public-inbox.org/git/20250210074623.136599-1-ayu.chandekar@gmail.com/
Status: Dropped
Description: Add progress reporting during the QSORT operation in
multi-pack-index verification. While going through the code,
I found this TODO, which I thought was interesting however my
approach assumed that the qsort() operation processes elements
in a structured order, which isn't guaranteed.
+ Stop depending on `the_repository` for core.attributesfile
Thread:
https://public-inbox.org/git/20250310151048.69825-1-ayu.chandekar@gmail.com/
Status: WIP, needs more discussion.
Description: This patch refactors access to the `core.attributesfiles`
configuration by moving it into the `repo_settings` struct.
It eliminates the global variable `git_attributes_file` and
updates relevant code paths to pass the `struct repository`
as a parameter.
For git.github.io:
+ GSoC-participants: add GSoC 2024 participants to the list #762
Status: Merged into master
Description: Adding GSoC 2024 participants will help new
contributors understand their journey, making it easier for them
to navigate the program and the project.
Proposed Plan:
--------------
I have been reviewing global variables across the codebase to understand their
dependencies and impact. To do this, I examined `config.c` and cross-referenced
it with `environment.c` to see how these variables are currently managed. The
goal of this project is to eliminate global variables by moving their
configurations into their local contexts.
The general approach for handling a global variable begins with understanding
its purpose. This involves tracing its usage across the codebase and identifying
the subsystem it should belong to. If the variable is closely tied to
repository-related functionality, it may belong in struct repository or
struct repo_settings. Otherwise, it should be placed in a more suitable
context based on its scope.
Additionally, it's important to review previous attempts or related patches
to understand past design decisions and ensure consistency with ongoing efforts.
Finally, the global instance is eliminated by relocating the variable into the
appropriate context and passing it through the relevant code paths.
Example: Handling `is_bare_repository_cfg`
The variable `is_bare_repository_cfg` determines whether a repository is bare,
meaning it lacks a working directory. Since this property is fundamental to
how a repository functions, it should be placed in struct repository.
I have also gone through the code paths and analyzed how this variable is
initialized. We can initialize it similarly to how hash_algo is set through
the repository format. The repository format already contains an `is_bare`
field, which we can use to set this variable inside struct repository.
However, I still have some questions regarding why the is_bare_repository()
function checks for `repo->worktree` and why the `worktree struct` itself has
an `is_bare` variable. If a repository is considered bare when !repo->worktree
is true, the role of `worktree->is_bare` needs further clarification. I believe
that by engaging with the community, my understanding will become clearer.
I also went through [4] to see how John Cai's approach was.
This is how we can also approach for other global variables.
Through multiple iterations, this approach will be refined based on feedback,
edge cases, and community input.
Timeline:
---------
Pre-GSOC:
(Until 8 May)
- Explore the codebase more, focusing on environment-related code paths.
- Document how each global variable is used and how it can be moved to
repository settings.
- Study Git’s Coding Guidelines and the Pro Git Book to align with best practices.
----------
Community Bonding:
(May 8 - June 1)
- Engage with mentors to discuss different environment variables, their
dependencies, and the best approach for refactoring.
- Finalize an implementation plan based on discussions.
- Since I will be on summer vacation, I can start coding early and make progress
on the project.
----------
Coding Period:
(June 2 - August 25)
- Identify the appropriate subsystem for each global variable and relocate it
into struct repository, struct repo_settings, or other suitable contexts.
- Modify function signatures to pass the new contexts explicitly, replacing
reliance on global variables.
- Continuously submit patches for review and incorporate feedback from mentors
and the community.
- I plan to write weekly blogs which will document what I did in the whole
week.
----------
Final Week:
(August 25 - September 1)
- Write a detailed report on the entire project.
- Fix bugs if any.
- Reflect on the project, noting challenges faced and lessons learned.
Blogging:
---------
I have also set up a blogging page at [10]. While reading blogs from previous
GSoC contributors, I found them useful in understanding the challenges
they faced and how they approached their projects. Their experiences gave
me a better idea of what to expect and how to navigate the development
process. Inspired by this, I decided to start my own blog to document my
journey throughout GSoC. This will not only help me track my own progress but
also serve as a resource for future contributors who might work on similar
projects. I plan to share updates on my work, challenges encountered and
insights gained from discussions with mentors and the community.
Additionally, I hope my blog encourages more people to contribute to open
source by providing a transparent look into the development process. Writing
about my experience will also help me reflect on my work and improve my
ability to communicate technical ideas effectively.
I liked the format and structure of Chandra's blog, so I decided to use the
same template for my own blogging page.
Availability:
-------------
As a college student, I intend to utilise my summer breaks from May to July
to work on the project. After completing my University exams in April, I can
start working in May. I can dedicate 40 hours a week from May to July, while
in August after the classes commence, I can dedicate about 25 hours a week.
There are no exams or planned vacations throughout the coding period. Besides
this project, I have no commitments/vacations planned for the summer. I shall
keep my status posted to all the community members and maintain transparency
in the project.
Post-GSOC:
----------
Beyond contributing code, I strongly believe in giving back to the community
and helping others grow. Open source thrives on mentorship, knowledge sharing,
and long-term involvement, and I would love to continue contributing even
after GSoC ends.
I have always valued mentorship, both as a mentee and as someone who enjoys
guiding others. If given the opportunity, I would be more than happy to
mentor/co-mentor future GSoC contributors. By staying involved in the
community, whether through contributing, reviewing patches, or mentoring,
I hope to help sustain and expand the project’s reach. I look at GSoC as not
just as a one-time contribution but as a step toward a longer-term relationship
with open source.
I will continue to be involved with Git even after GSoC by contributing patches,
reviewing code, and participating in discussions. My work on refactoring Git’s
state aligns with long-term improvements to the codebase, and I plan to keep
refining it beyond the program. I see GSoC as just the beginning of my journey
with Git.
Appreciation:
-------------
I appreciate the Git community for its excellent documentation, which made it
much easier for me to understand Git in depth. The well-structured resources
helped me navigate the codebase and gain a deeper understanding of how Git
works internally.
Beyond the documentation, I am also grateful for how welcoming and supportive
the community has been. Whether through discussions on the mailing list or
feedback on my patches, the information and guidance I received made my
experience even better.
Additionally, I read the blogs and proposals of Chandra, Jialuo, and Ghanashyam,
which provided valuable insights into their journeys and helped me shape my
own approach to contributing.
Thanks for reviewing this proposal.
References:
-----------
[1] https://github.com/sdslabs/beast/pull/374
[2] https://github.com/sdslabs/beast/tree/add-teams-with-hint
[3] https://github.com/sdslabs/playCTF/pull/177
[4] https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/
[5] https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/
[6] https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/
[7] https://public-inbox.org/git/pull.1829.git.1731653548549.gitgitgadget@gmail.com/#t
[8] https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/
[9] https://public-inbox.org/git/cover.1724923648.git.ps@pks.im/
[10] https://ayu-ch.github.io
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2025-04-04 8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
@ 2025-04-04 14:45 ` Karthik Nayak
2025-04-06 10:44 ` Ayush Chandekar
2025-04-07 8:42 ` Ayush Chandekar
1 sibling, 1 reply; 15+ messages in thread
From: Karthik Nayak @ 2025-04-04 14:45 UTC (permalink / raw)
To: Ayush Chandekar; +Cc: christian.couder, git, ps, shejialuo, shyamthakkar001
[-- Attachment #1: Type: text/plain, Size: 3243 bytes --]
Ayush Chandekar <ayu.chandekar@gmail•com> writes:
[snip]
> Proposed Plan:
> --------------
>
> I have been reviewing global variables across the codebase to understand their
> dependencies and impact. To do this, I examined `config.c` and cross-referenced
> it with `environment.c` to see how these variables are currently managed. The
> goal of this project is to eliminate global variables by moving their
> configurations into their local contexts.
>
> The general approach for handling a global variable begins with understanding
> its purpose. This involves tracing its usage across the codebase and identifying
> the subsystem it should belong to. If the variable is closely tied to
> repository-related functionality, it may belong in struct repository or
> struct repo_settings. Otherwise, it should be placed in a more suitable
> context based on its scope.
>
> Additionally, it's important to review previous attempts or related patches
> to understand past design decisions and ensure consistency with ongoing efforts.
> Finally, the global instance is eliminated by relocating the variable into the
> appropriate context and passing it through the relevant code paths.
>
> Example: Handling `is_bare_repository_cfg`
> The variable `is_bare_repository_cfg` determines whether a repository is bare,
> meaning it lacks a working directory. Since this property is fundamental to
> how a repository functions, it should be placed in struct repository.
>
> I have also gone through the code paths and analyzed how this variable is
> initialized. We can initialize it similarly to how hash_algo is set through
> the repository format. The repository format already contains an `is_bare`
> field, which we can use to set this variable inside struct repository.
>
> However, I still have some questions regarding why the is_bare_repository()
> function checks for `repo->worktree` and why the `worktree struct` itself has
> an `is_bare` variable. If a repository is considered bare when !repo->worktree
> is true, the role of `worktree->is_bare` needs further clarification. I believe
> that by engaging with the community, my understanding will become clearer.
> I also went through [4] to see how John Cai's approach was.
>
> This is how we can also approach for other global variables.
> Through multiple iterations, this approach will be refined based on feedback,
> edge cases, and community input.
>
So the approach you suggest is to comb through the global variables and
config and find new locations for them to be stored. While this is
definitely a bunch chunk of the problem, shouldn't we also talk about
how we can reduce usage of some of these variables?
In particular, I'm wondering how you'd want to tackle 'the_repository'
usage. There is some previous work done here, where Patrick added the
'#define USE_THE_REPOSITORY_VARIABLE' definition which tracks usage of
global variable and usage of them in different files.
A possible approach which has been followed is to simply go from the
bottom layers of the code upwards, cleaning up usage of global variables
and ensuring we can remove '#define USE_THE_REPOSITORY_VARIABLE' from
files. This is also the approach taken in some of the patches that
you've linked.
[snip]
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* [GSoC PROPOSAL v2] Refactoring in order to reduce Git’s global state
2025-04-02 18:14 [GSoC PROPOSAL v1] Refactoring in order to reduce Git’s " Arnav Bhate
@ 2025-04-05 18:41 ` Arnav Bhate
0 siblings, 0 replies; 15+ messages in thread
From: Arnav Bhate @ 2025-04-05 18:41 UTC (permalink / raw)
To: git; +Cc: Patrick Steinhardt
## Personal Information
- Full name: Arnav Akshaya Bhate
- Email address: bhatearnav@gmail•com
- Mobile no.: +91 8291328838
- Time zone: UTC+05:30
- Education: IIT Bombay
- Year: Second year
- GitHub: https://github.com/arnavbhate
## About Me
I'm Arnav Bhate, a second-year UG student at Indian Institute of
Technology Bombay. I love coding and so I am a member of IIT Bombay's
Developers' Community (DevCom), which is a group of roughly 40 people
developing software for use by students and staff of the institute. Most
of the software developed is not open source, so I can not include
examples of my work there in this proposal. Being a member of DevCom has
exposed me to collaborative software development.
A common link in all software I have worked on is that Git has been used
for version control. I thus see this project as my way of giving back to
the Git community in particular and open source in general. This will be
my first significant contribution to the open source community, and I
wish to stick around afterwards.
## Overview
Git currently uses many global variables, most significantly
`the_repository`, which are included in roughly 290 files. Apart from
`the_repository`, there are many global variables, some of which
logically belong in struct repository, as they represent information
specific to a repository. So even if all instances of the_repository
were converted into a extra repository argument for the function, there
would still be many global variables left.
The use of such variables assumes that Git will only operate on one
repository at a time, which renders multi-repository handling
impossible without kludges.
This project aims to move such variables from global scope into more
appropriate local contexts, mainly `struct repository` and
`struct repository_settings`. This will not only make the environment
repository-specific, allowing easy multi-repository handling, but also
make maintaining the code easier.
The project involves identifying suitable locations for environment
variables in repository specific structs, moving them there and updating
all the code affected by the move.
## Pre-GSoC
I first got into Git's codebase in February 2025, with my first
contribution in March. My first patch was on my microproject and since
then I have submitted two more patches on a similar topic.
### Patches
- (Microproject) decorate: fix sign comparison warnings
Thread: https://lore.kernel.org/git/afa6b428-3190-42ae-9eac-540c95b576fd@gmail.com/
Status: Merged into master
Commit hash: 2bfd3b368572cbf1ce287de09db08b7e7e429ecd
Description: Refactoring of decorate.c to replace signed variables
with unsigned ones when they are used to iterate over arrays whose
sizes are represented by unsigned variables, and remove 2 unnecessary
variables which just hold the value of another variable without being
modified, replacing them with the variable whose value they were
holding.
- rm: fix sign comparison warnings
Thread: https://lore.kernel.org/git/38de63ce-6d4e-4f1f-95b1-049df78d9cfc@gmail.com/
Status: Under discussion
Description: Refactoring of rm.c to make iterators over arrays whose
sizes are represented by unsigned variables unsigned. Specifically in
`get_ours_cache_pos`, where before a signed variable was being passed
and then inverted in the function, now the already inverted variable
is passed as an unsigned variable, with the inversion moved to the
function call.
- pathspec: fix sign comparison warnings
Thread: https://lore.kernel.org/git/a3aa5f99-63ce-4be5-8d64-fb6e226b3bf9@gmail.com/
Status: Under discussion
Description: Refactoring of pathspec.c to make array iterator
variables match the type of the variable storing the array's size.
Where replacing the variable's type is not possible, because of the
large-scale cascade replacements it would cause, an appropriate cast
has been added.
- environment.h: remove unused variables
Thread: https://lore.kernel.org/git/2c547567-2b72-476c-9fc5-71cac050fa15@gmail.com/
Status: Under discussion
Description: Removing two variables which did not have any references
in the codebase, as they had been moved to `struct repo_settings`, but
were not removed from environment.h.
## Proposed Plan
- Identifying global variables in environment.c that should be moved and
identifying suitable locations, some could be moved directly into
`struct repository`, some in its sub-structs that already exist and
some in newly created sub-structs.
- Identifying and updating occurrences of these variables to reference
their new locations.
- Identifying all occurrences of `the_repository` and updating them to
use a `struct repository` passed to the function.
It makes sense that all the variables need not be in the same struct, as
separation would keep the codebase organised, and thus easier to
maintain. It would also make it easier to introduce these changes
systematically, as a group of related variables, combined together in a
struct, could be introduced in a single patch series.
### Timeline
#### Pre-GSoC (Until May 8)
- Explore the codebase, identifying locations where global variables
from environment.c are used.
- Identify suitable locations for these global variables.
#### Community Bonding Period (May 8 - June 1)
- Interact with mentor, discussing the locations I have decided, and
refining the plan if required.
- Start coding early, as my summer break will have started. (See coding
period)
#### Coding Period (June 2 - August 25)
- Move global variables to their new locations in various structs,
and refactor functions that depend on them to use their new locations.
- Variables which represent settings from config (7 weeks)
- Core (5 weeks)
- Others (2 weeks)
- Variables not from config (3 weeks)
- Modify functions to add an `struct repository` argument where they
depend on `the_repository` and replace all occurrences of it in the
function.
#### Final Week (August 25 - September 1)
- Fix any bugs that may be left.
- Write final report.
### Availability
My summer break from college lasts from May to July. I am currently
planning on taking a vacation during this period of about 1 week,
however, the dates have not been decided. Outside of this vacation, I
am not occupied in the break and can devote up to 60 hours a week
towards the project. In August, once classes recommence, I will be
available for 20 hours a week.
## Post-GSoC
After completing my project, I plan on staying active and contributing
patches, and start reviewing code.
--
Regards,
Arnav Bhate
(He/Him)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2025-04-04 14:45 ` Karthik Nayak
@ 2025-04-06 10:44 ` Ayush Chandekar
2025-04-07 9:06 ` Christian Couder
0 siblings, 1 reply; 15+ messages in thread
From: Ayush Chandekar @ 2025-04-06 10:44 UTC (permalink / raw)
To: Karthik Nayak; +Cc: christian.couder, git, ps, shejialuo, shyamthakkar001
>
> So the approach you suggest is to comb through the global variables and
> config and find new locations for them to be stored. While this is
> definitely a bunch chunk of the problem, shouldn't we also talk about
> how we can reduce usage of some of these variables?
>
> In particular, I'm wondering how you'd want to tackle 'the_repository'
> usage. There is some previous work done here, where Patrick added the
> '#define USE_THE_REPOSITORY_VARIABLE' definition which tracks usage of
> global variable and usage of them in different files.
>
> A possible approach which has been followed is to simply go from the
> bottom layers of the code upwards, cleaning up usage of global variables
> and ensuring we can remove '#define USE_THE_REPOSITORY_VARIABLE' from
> files. This is also the approach taken in some of the patches that
> you've linked.
>
Your approach makes a lot of sense to me, that is, picking a specific
subsystem or file and aiming to remove the `#define USE_THE_REPOSITORY_VARIABLE`
definition and thus 'the_repository' eventually. This was the method
used by Patrick to tackle
the object subsystem in [1] and the path subsystem in [2] and you to
tackle the packfile in [3].
This approach also helps in removing some of the global variables used
within that particular
subsystem, which is a nice bonus.
However, this approach might not be feasible for the global variables that
arent tightly tied to a single subsystem. So what I can do is, for removing
`the_repository`, I can follow the approach you mentioned, and for relocating
the more general global variables, I can use the approach which I
talked about in the
proposal.
What do you think?
[1]: https://public-inbox.org/git/20250303-b4-pks-objects-without-the-repository-v1-0-c5dd43f2476e@pks.im/
[2]: https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-0-4e77f0313206@pks.im/
[3]: https://public-inbox.org/git/cover.1733236936.git.karthik.188@gmail.com/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2025-04-04 8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
2025-04-04 14:45 ` Karthik Nayak
@ 2025-04-07 8:42 ` Ayush Chandekar
1 sibling, 0 replies; 15+ messages in thread
From: Ayush Chandekar @ 2025-04-07 8:42 UTC (permalink / raw)
To: Ayush Chandekar, Patrick Steinhardt
Cc: christian.couder, git, karthik nayak, shejialuo,
Ghanshyam Thakkar
Hey Patrick,
It would be great if you could take a look at my proposal, especially since
you've worked on this area before. Any feedback would be really appreciated!
Thanks!
Ayush
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2025-04-06 10:44 ` Ayush Chandekar
@ 2025-04-07 9:06 ` Christian Couder
2025-04-07 10:07 ` Ayush Chandekar
0 siblings, 1 reply; 15+ messages in thread
From: Christian Couder @ 2025-04-07 9:06 UTC (permalink / raw)
To: Ayush Chandekar; +Cc: Karthik Nayak, git, ps, shejialuo, shyamthakkar001
On Sun, Apr 6, 2025 at 12:44 PM Ayush Chandekar <ayu.chandekar@gmail•com> wrote:
>
> >
> > So the approach you suggest is to comb through the global variables and
> > config and find new locations for them to be stored. While this is
> > definitely a bunch chunk of the problem, shouldn't we also talk about
> > how we can reduce usage of some of these variables?
> >
> > In particular, I'm wondering how you'd want to tackle 'the_repository'
> > usage. There is some previous work done here, where Patrick added the
> > '#define USE_THE_REPOSITORY_VARIABLE' definition which tracks usage of
> > global variable and usage of them in different files.
> >
> > A possible approach which has been followed is to simply go from the
> > bottom layers of the code upwards, cleaning up usage of global variables
> > and ensuring we can remove '#define USE_THE_REPOSITORY_VARIABLE' from
> > files. This is also the approach taken in some of the patches that
> > you've linked.
> >
>
> Your approach makes a lot of sense to me, that is, picking a specific
> subsystem or file and aiming to remove the `#define USE_THE_REPOSITORY_VARIABLE`
> definition and thus 'the_repository' eventually. This was the method
> used by Patrick to tackle
> the object subsystem in [1] and the path subsystem in [2] and you to
> tackle the packfile in [3].
> This approach also helps in removing some of the global variables used
> within that particular
> subsystem, which is a nice bonus.
>
> However, this approach might not be feasible for the global variables that
> arent tightly tied to a single subsystem.
Well, initially 'the_repository' wasn't tightly tied to a single
subsystem and even now I am not sure we could say it's tightly tied to
a single subsystem. Or maybe I don't understand what you mean.
Do you mean that it's tightly tied because it needs `#define
USE_THE_REPOSITORY_VARIABLE`?
But for other global variables it could be possible to define and use
similar macros. This way it might be possible to remove those
variables step by step only in some files.
> So what I can do is, for removing
> `the_repository`, I can follow the approach you mentioned, and for relocating
> the more general global variables, I can use the approach which I
> talked about in the
> proposal.
>
> What do you think?
If removing `the_repository` is part of your proposal, then yeah,
describing the approach you will use to remove is a good idea.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC] [PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2025-04-07 9:06 ` Christian Couder
@ 2025-04-07 10:07 ` Ayush Chandekar
0 siblings, 0 replies; 15+ messages in thread
From: Ayush Chandekar @ 2025-04-07 10:07 UTC (permalink / raw)
To: Christian Couder; +Cc: Karthik Nayak, git, ps, shejialuo, shyamthakkar001
>
> Well, initially 'the_repository' wasn't tightly tied to a single
> subsystem and even now I am not sure we could say it's tightly tied to
> a single subsystem. Or maybe I don't understand what you mean.
>
> Do you mean that it's tightly tied because it needs `#define
> USE_THE_REPOSITORY_VARIABLE`?
>
Sorry if I was not clear earlier. I wasn't referring to
'the_repository' being tied, it was about other
global variables being tied to a subsystem.
What I meant is that the approach of picking a subsystem and removing the
`#define USE_THE_REPOSITORY_VARIABLE` is really effective for removing
'the_repository.'
It also helps in localizing the global variables from environment.h
that are specific to that subsystem,
either into the subsystem itself or into struct repository / repo_settings.
But let's say if a global variable is common to 2-3 different subsystems, then
this approach would not be feasible for that variable. For that, I
would require to individually
tackle that variable. This is an approach which I mentioned in my proposal.
So using these two approaches according to different needs, I can move forward.
> But for other global variables it could be possible to define and use
> similar macros. This way it might be possible to remove those
> variables step by step only in some files.
>
Yes, I still need to think through how that would align with the
approach I mentioned.
Defining a single macro like `#define USE_GLOBAL_VARIABLES` is
something I can look into.
> > So what I can do is, for removing
> > `the_repository`, I can follow the approach you mentioned, and for relocating
> > the more general global variables, I can use the approach which I
> > talked about in the
> > proposal.
> >
> > What do you think?
>
> If removing `the_repository` is part of your proposal, then yeah,
> describing the approach you will use to remove is a good idea.
Yes, it is a part of the project but I haven't added this specific
approach in the proposal yet and was hence asking if I can.
Thanks:)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state
@ 2026-03-06 14:57 Shreyansh Paliwal
2026-03-07 10:33 ` Christian Couder
2026-03-07 20:04 ` [GSOC][PROPOSAL v2]: " Shreyansh Paliwal
0 siblings, 2 replies; 15+ messages in thread
From: Shreyansh Paliwal @ 2026-03-06 14:57 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
Hello all,
This is my first draft of GSoC 2026 proposal for the project
'Refactoring in order to reduce Git’s global state'.
Doc version can be read at:
https://docs.google.com/document/d/16MRNUv6dJi6vtNvI5Ro0WmHf20dRRBHjFLpmhAuaUOA/edit?usp=sharing
Any feedback or suggestions would be greatly appreciated.
Thanks for reading.
---
Refactoring in order to reduce Git's global state
Personal Information:
---------------------
Name: Shreyansh Paliwal
Email: Shreyanshpaliwalcmsmn@gmail•com
Alternate Email: Shreyansh.01014803123@it•mait.ac.in
Mobile No.: +91-9335120023
Education: GGSIPU, New Delhi, India
Year: III / IV
Degree: Bachelor of Technology in Information Technology
Github: https://github.com/shreyp135
Time-zone: UTC +5:30 (IST)
About Me:
---------
I am Shreyansh Paliwal, a pre-final year undergraduate student at Guru
Gobind Singh Indraprastha University, New Delhi, India. I am a technology
enthusiast, who began programming in 2018 with Java as my first language
and later transitioned to C/C++ in 2023 as my primary focus. I enjoy
exploring new technologies and programming languages, and I have developed
solid experience building applications using TypeScript, React.js, Node.js,
and AWS. I actively participate in technical events and have organized
multiple hackathons, tech-fests, and related activities at my college as
the SIG-Head of IOSD, a tech-focused student community.
I started using Git in 2023, which is also when I made my first open-source
contribution to the Git project. I was a winner of Augtoberfest 2024, an
open-source competition organized by C4GT India. Over the past several
months, I have been involved with the Git project, studying the codebase,
submitting patches, and incorporating review feedback. I am motivated to
improve the experience of Git for end users, and this project is an
excellent opportunity to continue that work.
Overview:
---------
Git relies heavily on global state for managing environment variables and
configuration data. In particular, many parts of the codebase depend on the
global struct repository instance, the_repository, which represents the
currently active repository. Instead of passing a repository instance
explicitly, several internal functions implicitly rely on this global
object. Additionally, various configuration derived values and
environment-related variables such as the_hash_algo, default_abbrev, and
comment_line_str are stored globally, most of them defined in
environment.c.
This design assumes that only one repository is active within a process at
a time. As a result, the repository state becomes shared across the entire
process, weakening isolation and making behavior implicitly dependent on
global context. Such global dependencies make the code harder to reason
about, test, and maintain, and can introduce subtle bugs when operations
interact with multiple repositories. They also limit long-term goals such
as safely supporting multiple repositories within a single process and
continuing Git’s ongoing libification efforts.
To address these issues, global environment and configuration state should
be refactored into better-scoped contexts. Repository-specific data can be
moved into struct repository or related structures, while
subsystem-specific state should be localized appropriately. Passing
repository instances explicitly through function interfaces will improve
modularity, reduce hidden dependencies, and make the codebase easier to
maintain while moving Git closer to supporting multiple repositories safely
within a single process.
The difficulty of this project is medium, and it is estimated to take 175
to 350 hours.
Pre-GSOC:
---------
I first explored the Git codebase in December 2023, when I submitted a
small patch fixing the wording of an error message that I noticed while
browsing the source code. At that time I had recently started using Git and
GitHub for version control in my projects, which sparked my curiosity about
how Git works internally.
A few months ago, when I had some free time from college, I decided to
start contributing to Git more actively. I built Git from source, read
parts of the documentation, and familiarized myself with the mailing list
workflow. While going through the documentation, I noticed a few
inconsistencies in the MyFirstContribution page and submitted patches to
fix them. I also completed a microproject involving a test cleanup, and
later worked on adding a warning for a quiet fallback.
During this process, I attempted to remove the usage of the_repository from
a file. However, after discussion on the mailing list, Phillip pointed out
that the change was not particularly useful in that context and could
introduce segfaults that would not justify the effort for builtin code.
Based on this feedback, I dropped that attempt and instead focused on
understanding the broader global state refactoring effort. To better
understand the project area, I studied previous patches and blog posts by
Ayush Chandekar and Olamide Bello, followed discussions on the mailing
list, and explored parts of the codebase such as the wt-status and worktree
subsystems. This helped me understand the ongoing effort to reduce Git’s
reliance on global state and motivated me to work further in this area.
The following is a list of my contributions, ordered from earliest to most
recent:
Patches for Git:
----------------
* test-lib-functions.sh: fix test_grep fail message wording
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20231203171956.771-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: 37e8d795bed7b93d3f12bcdd3fbb86dfe57921e6
Log: This was my first patch to Git in 2023. While browsing the
source code and past issues, I noticed that even after
the test_i18ngrep function was deprecated, an error message
referring to test_grep was left behind. I updated the
wording to correctly reference test_i18ngrep.
* doc: MyFirstContribution: fix missing dependencies and clarify build steps
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260112195625.391821-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: 81021871eaa8b16a892b9c8791a0c905ab26e342
Log: While getting familiar with the codebase, I followed the
MyFirstContribution documentation and encountered a few
issues. Some include headers were missing, the synopsis
format was incorrect, and the explanation for -j$(nproc)
was absent. I submitted fixes to improve the clarity and
correctness of the documentation.
* t5500: simplify test implementation and fix git exit code suppression (Microproject)
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260121130012.888299-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: a824421d3644f39bfa8dfc75876db8ed1c7bcdbf
Log: This was completed as a microproject for GSoC. Instead of
constructing the pack protocol using a complex combination
of here-docs and echo commands, the patch captures command
outputs beforehand and uses the test-tool pkt-line pack
helper to construct the protocol input in a temporary file
before feeding it to git upload-pack.
* show-index: add warning and wrap error messages with gettext
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260130153603.290196-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: ea39808a22714b8f61b9472de7ef467ced15efea,
227e2cc4e1415c4aeadceef527dd33e478ad5ec3
Log: While exploring the code, I noticed a TODO comment suggesting
automatic hash detection. After discussion on the mailing
list, it was concluded that there was no future-proof
approach to implement this until a new index file format
came into use. Instead, an explicit warning was added rather
than silently falling back to SHA-1. Additionally, several
error messages were missing gettext wrapping, which was also
fixed.
* wt-status: reduce reliance on global state
Status: Merged into seen
Mailing List: https://lore.kernel.org/git/20260218175654.66004-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: a7cd24de0b3b679c16ae3ee8215af06aeea1e6a3,
9d0d2ba217f3ceefb0315b556f012edb598b9724,
4631e22f925fa2af8d8548af97ee2215be101409
Log: This has been the most significant patch series in my journey
so far. It began with a suggestion from Phillip to clean up
some the_repository usages in wt-status.c. I extended the
effort to remove all usages of the_repository and
the_hash_algo from the file. During review discussions, it
was suggested that some worktree API cleanup should happen
first, particularly regarding the representation of worktrees
as NULL. Some related changes were later moved to a separate
series, after which this refactoring proceeded.
* worktree: change representation and usage of primary worktree
Status: Continued by Phillip Wood [1]
Mailing List: https://lore.kernel.org/git/20260213120529.15475-1-shreyanshpaliwalcmsmn@gmail.com/
Log: This worktree API cleanup series started while I was working
on wt-status. The intention was to modify the representation
of the current worktree so that struct worktree would not be
NULL. During discussion, Phillip clarified that NULL actually
represents the current worktree rather than the primary
worktree. Since Phillip already had a patch based on the right
logic, he continued the series and it was eventually merged
into master.
* tree-diff: remove the usage of the_hash_algo global
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260220175331.1250726-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: 1e50d839f8592daf364778298a61670c4b998654
Log: This was a straightforward patch that removed the remaining
usages of the global the_hash_algo in tree-diff.c by using the
repository’s local instance instead.
* send-email: UTF-8 encoding in subject line
Status: Merged into seen
Mailing List: https://lore.kernel.org/git/20260228112210.270273-1-shreyanshpaliwalcmsmn@gmail.com/
Merge Commit: c52f085a477c8eece87821c5bbc035e5a900eb12
Log: This patch was motivated by an issue I personally encountered
while sending a GSoC discussion email [2]. Initially the
change only modified the wording of the prompt, but after
discussion on the mailing list it was extended to include
proper validation to prevent invalid charset encodings from
being used in git send-email and to reduce confusion.
* Remove global state from editor.c
Status: Waiting for further feedback
Mailing List: https://lore.kernel.org/git/20260301105228.1738388-1-shreyanshpaliwalcmsmn@gmail.com/
Log: This was based on my doubt on localizing editor_program in
editor.c [2]. The patch received mixed feedback from
contributors and is currently awaiting additional guidance
from mentor and/or maintainer regarding the appropriate
direction.
Patches for git.github.io:
--------------------------
* SoC-2026-ideas: Remove an extra backtick
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/831
Merge Commit: c1e4aa87a54430953eaa7355061139fdf1ff6796
Log: Minor Typo fix.
* rn-132: fixed 2 typos
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/832
Merge Commit: 92876114d855d472ce2e0e5337e72a4b97b81681
Log: Fixed typos in Git Rev News Edition 132.
I have also been involved in additional discussions on the Git mailing
list [3][4][5][6].
History / Background:
--------------------
Efforts to reduce Git’s reliance on global state started when several Git
subsystems began moving toward libification, where Git’s internal
functionality could be reused as a library. Early examples of this
direction include major patch series such as the libification of git
mailinfo by Junio [7] and git apply by Christian [8]. These large patch
series exposed the limitations of relying on process-wide global state and
highlighted the need for better encapsulation of repository-related data.
One important step in this direction was the introduction of struct
repository, through refactoring work by Stefan Beller [9] and Brandon
Williams [10]. The motivation behind this structure was to centralize
repository-related state instead of relying on scattered global variables.
This change improved code clarity and made it easier to reason about Git’s
internal behavior. It also laid the groundwork for future improvements such
as safer multithreading and the possibility of handling submodules within
the same process. Later, additional refactoring work by Patrick further
removed reliance on the global the_repository in config [11] and path [12]
subsystems. As part of this work, several variables were consolidated into
environment.c from config.c so that environment-related state could be
managed in a single location [13]. The macro #define
USE_THE_REPOSITORY_VARIABLE was also introduced to help transition code
away from implicit global repository access [14].
This project area was further explored during GSoC 2025 by Ayush Chandekar
[15], who continued removing usages of the_repository across different parts
of the codebase and relocated several global configuration variables (such as
core_preload_index and merge_log_config) into repository-scoped structures.
More recently, Olamide Bello, during the Outreachy program, made significant
progress in improving how configuration values are stored [16] [17]. His work
introduced a new structure, repo_config_values, which stores repository
specific configuration values, linked to struct repository. This allows
configuration values to be associated with a specific repository instance
rather than stored globally. Along with this, a private structure
config_values_private was added to support initialization and internal
handling of these values. During discussions around these changes, an
important design consideration also emerged, moving global variables directly
into repository structures or introducing lazy loading helpers can lead to
user experience regressions if configuration errors are detected later.
These efforts collectively form the foundation of the ongoing work to
gradually remove Git’s reliance on global state and move toward a more
modular, repository-scoped architecture.
Proposed Plan:
-------------
I started exploring the codebase by browsing relevant files and identifying
global variables by temporarily removing the USE_THE_REPOSITORY_VARIABLE
macro. My primary focus was on core library files rather than builtin code
[18]. Through this exploration, I observed that a large number of files still
depend on the_repository.
To tackle this project systematically, I propose classifying these files into
two categories:
1. Files using the_repository or the_hash_algo where a repository instance
already exists: These files rely on global variables even though a
struct repository instance is available somewhere in the call stack. In
such cases, the refactor primarily involves passing the repository
instance through the function call stack and replacing the global
usages. In some cases, a repository instance may not be directly
available in the file itself. In those situations, I will trace the
callers and propagate repository instances from higher levels in the call
hierarchy. Examples of such files include, alias.c, archive*.c,
walker.c, xdiff-interface.c. These cases generally require localized
refactoring and are good candidates for incremental patches.
2. Files relying on other global variables defined in environment.c: Some
files rely on additional global variables which are parsed and accessed
through environment.c. In these cases, there is no existing
repository-scoped instance, which makes refactoring slightly more
technical. Examples include, wt-status.c (default_abbrev,
comment_line_str), apply.c (has_symlink, ignore_case,
trust_executable_bit, apply_default_whitespace,
apply_default_ignorewhitespace). For such variables, I plan to evaluate
whether they should be moved into a repository-scoped structure (e.g.,
repo_settings, repo_config_values), or they should instead be localized
and passed explicitly where needed. The appropriate approach will depend
on how widely the variable is used and whether it logically fits in the
multi-repository standpoint.
I plan to begin with the first category, addressing straightforward
refactors file by file. In parallel, I will analyze and work on specific
groups of global variables from the second category, designing appropriate
repository-scoped replacements.
The end goal is to remove reliance on global state and eventually eliminate
the USE_THE_REPOSITORY_VARIABLE macro from these files.
Project Timeline:
----------------
* Community Bonding (Until May 24):
- Discuss the project direction and design approaches with mentors.
- Identify and prioritize two main areas of work:
+ files that rely on the_repository.
+ global variables defined in environment.c.
- Study the previous patches by Olamide Bello and Ayush in depth and
also discuss with them about their approaches and challenges.
- Interact with all the people involved in this work to better
understand design decisions and potential pitfalls.
- Experiment with small RFC patches, if needed to validate approaches.
* Coding period (May 25 - August 16):
- Review the work done by Olamide Bello on moving values parsed by
git_default_config() into the repo_config_values structure and
identify any remaining tasks.
- Complete remaining cleanup or refactoring related to the worktree API,
if left any [19].
- Identify straightforward refactors to remove usages of the_repository
in files such as xdiff-interface.c, archive*.c, fsmonitor*.c etc.
- Work file by file with the goal of eliminating
#define USE_THE_REPOSITORY_VARIABLE by replacing global usages
with explicit repository instances.
- Concurrently maintain at least two parallel patch series:
+ Small / straightforward refactors and replacements like
the_hash_algo or the_repostitory.
+ Larger structural refactors involving globals such as
DEFAULT_ABBREV, comment_line_str etc.
- Publish weekly or biweekly blog updates documenting progress and design
decisions.
* Final week (august 17 - august 24):
- Address any remaining tasks or pending patches.
- Recieve final feedback from mentors and reviewers.
- Prepare a detailed report summarizing the work completed during the project.
Blogging:
---------
I believe blogging is an important part of any open-source project. It
helps others understand the ongoing work and also enables the contributor
to develop a deeper understanding and keep a better track of their own
progress. I experienced this firsthand, early in my journey I was unsure
about various aspects, but reading the blogs of Ayush and Olamide Bello
gave me valuable insight into the contributor perspective and their overall
work.
With the goal of helping future contributors in a similar way, I plan to
document my journey and project progress through regular blog posts. I will
publish updates on a weekly or biweekly basis, depending on the amount of
meaningful progress made. I have set up my blogging area on Medium, and my
posts will be available at [20].
Availability:
-------------
The main coding period runs from June to August. Most of June and July
coincide with my summer vacation, which allows me to dedicate significant
time to the project. My final exams are scheduled for May and will last
approximately one week, but they will be completed before the coding period
begins and should not affect my availability.
During June and July, I will be able to dedicate around 40 hours per week to
the project. In August, when my regular semester resumes, I expect to
contribute approximately 25–30 hours per week.
I do not have any other exams, internships, or planned vacations during the
coding period. Apart from this project, I have no other major commitments
for the summer.
I will keep the community regularly updated on my progress throughout the
project. My primary mode of communication will be email, and I will also be
available for calls or meetings if/when required. My preferred availability
window is 13:00–19:00 UTC.
Post GSoC:
----------
Being part of the Git community and contributing to the codebase has been a
very valuable experience for me. The process of understanding Git’s internals,
submitting patches, and receiving feedback on the mailing list has helped me
grow significantly as a developer. The feeling of working on code that is used
by millions of developers and companies around the world is very rewarding.
I plan to remain involved with the Git community even after GSoC by continuing
to contribute patches, review code, and participate in discussions to help make
Git better for end users. The work on refactoring Git’s global state is part of
a long-term effort, and I would love to continue working on it beyond the GSoC
timeline.
I would also be happy to mentor, co-mentor, or volunteer in the future to help
new and upcoming contributors whenever I get the chance. I see GSoC as the
starting point of a long-term relationship with the Git community.
Closing & Appreciation:
-----------------------
I would like to thank the Git community for the excellent documentation and the
welcoming environment. I am also grateful for the patience and guidance shown
in the feedback and discussions on the mailing list by Junio, Phillip, Karthik,
Ben, and others, which have helped me improve my understanding and contributions.
I also read blogs and proposals by Ayush, Lucas, Kousik Sanagavarapu, and Olamide
Bello, which provided valuable insights and helped shape my approach to contributing.
Thank you for reviewing my proposal :)
References:
-----------
[1]- https://lore.kernel.org/git/cover.1771511192.git.phillip.wood@dunelm.org.uk/
[2]- https://lore.kernel.org/git/20260304145823.189440-1-shreyanshpaliwalcmsmn@gmail.com/T/#m65b9b4547036991a7b7f3c861b9663428891f588
[3]- https://lore.kernel.org/git/20260114143238.536312-1-shreyanshpaliwalcmsmn@gmail.com/
[4]- https://lore.kernel.org/git/20260115211609.17420-1-shreyanshpaliwalcmsmn@gmail.com/
[5]- https://lore.kernel.org/git/20260204111343.71975-1-shreyanshpaliwalcmsmn@gmail.com/
[6]- https://lore.kernel.org/git/20260205131132.44282-1-shreyanshpaliwalcmsmn@gmail.com/
[7]- https://lore.kernel.org/git/1444778207-859-1-git-send-email-gitster@pobox.com/
[8]- https://lore.kernel.org/git/20160511131745.2914-1-chriscool@tuxfamily.org/
[9]- https://lore.kernel.org/git/20180205235508.216277-1-sbeller@google.com/
[10]- https://lore.kernel.org/git/20170531214417.38857-1-bmwill@google.com/
[11]- https://lore.kernel.org/git/cover.1715339393.git.ps@pks.im/
[12]- https://lore.kernel.org/git/20250206-b4-pks-path-drop-the-repository-v1-16-4e77f0313206@pks.im/
[13]- https://lore.kernel.org/git/20250717-pks-config-wo-the-repository-v1-20-d888e4a17de1@pks.im/
[14]- https://lore.kernel.org/git/cover.1718347699.git.ps@pks.im/
[15]- https://ayu-ch.github.io/2025/08/29/gsoc-final-report.html
[16]- https://cloobtech.hashnode.dev/week-5-and-6-design-reviews-rfcs-and-refining-the-path-forward
[17]- https://lore.kernel.org/all/cover.1771258573.git.belkid98@gmail.com/
[18]- https://lore.kernel.org/git/7b5dd0c4-0ca0-458e-89db-621a70dac9ae@gmail.com/
[19]- https://lore.kernel.org/git/20260217163909.55094-1-shreyanshpaliwalcmsmn@gmail.com/
[20]- https://medium.com/@shreyanshpaliwal18
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state
2026-03-06 14:57 [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state Shreyansh Paliwal
@ 2026-03-07 10:33 ` Christian Couder
2026-03-07 12:46 ` Shreyansh Paliwal
2026-03-07 20:04 ` [GSOC][PROPOSAL v2]: " Shreyansh Paliwal
1 sibling, 1 reply; 15+ messages in thread
From: Christian Couder @ 2026-03-07 10:33 UTC (permalink / raw)
To: Shreyansh Paliwal
Cc: git, karthik.188, jltobler, ayu.chandekar, siddharthasthana31
Hi Shreyansh,
On Fri, Mar 6, 2026 at 4:16 PM Shreyansh Paliwal
<shreyanshpaliwalcmsmn@gmail•com> wrote:
>
> Hello all,
>
> This is my first draft of GSoC 2026 proposal for the project
> 'Refactoring in order to reduce Git’s global state'.
Thanks for your interest in Git.
> I am Shreyansh Paliwal, a pre-final year undergraduate student at Guru
> Gobind Singh Indraprastha University, New Delhi, India. I am a technology
> enthusiast, who began programming in 2018 with Java as my first language
> and later transitioned to C/C++ in 2023 as my primary focus. I enjoy
> exploring new technologies and programming languages, and I have developed
> solid experience building applications using TypeScript, React.js, Node.js,
> and AWS. I actively participate in technical events and have organized
> multiple hackathons, tech-fests, and related activities at my college as
> the SIG-Head of IOSD, a tech-focused student community.
Interesting. Do you have links about these?
> Pre-GSOC:
> ---------
> During this process, I attempted to remove the usage of the_repository from
> a file. However, after discussion on the mailing list, Phillip pointed out
> that the change was not particularly useful in that context and could
> introduce segfaults that would not justify the effort for builtin code.
> Based on this feedback, I dropped that attempt and instead focused on
> understanding the broader global state refactoring effort. To better
> understand the project area, I studied previous patches and blog posts by
> Ayush Chandekar and Olamide Bello, followed discussions on the mailing
> list, and explored parts of the codebase such as the wt-status and worktree
> subsystems. This helped me understand the ongoing effort to reduce Git’s
> reliance on global state and motivated me to work further in this area.
>
> The following is a list of my contributions, ordered from earliest to most
> recent:
>
> Patches for Git:
> ----------------
>
> * test-lib-functions.sh: fix test_grep fail message wording
> Status: Merged into master
The status should be "Released as part of v2.43.1" or something like
that as far as I can see.
> Mailing List: https://lore.kernel.org/git/20231203171956.771-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: 37e8d795bed7b93d3f12bcdd3fbb86dfe57921e6
If you say "Merge Commit" we expect the commit that merged your work.
It looks like this commit contains your work, so I think it's better
to just say "Commit" instead.
> Log: This was my first patch to Git in 2023. While browsing the
> source code and past issues, I noticed that even after
> the test_i18ngrep function was deprecated, an error message
> referring to test_grep was left behind. I updated the
> wording to correctly reference test_i18ngrep.
I think it should be something like:
... even after the test_i18ngrep function was deprecated, an error
message referring to test_i18ngrep was left behind. I updated the
wording to correctly reference test_grep.
> * doc: MyFirstContribution: fix missing dependencies and clarify build steps
> Status: Merged into master
> Mailing List: https://lore.kernel.org/git/20260112195625.391821-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: 81021871eaa8b16a892b9c8791a0c905ab26e342
Same thing about "Merge Commit" vs "Commit". Below too.
> Log: While getting familiar with the codebase, I followed the
> MyFirstContribution documentation and encountered a few
> issues. Some include headers were missing, the synopsis
> format was incorrect, and the explanation for -j$(nproc)
> was absent. I submitted fixes to improve the clarity and
> correctness of the documentation.
>
> * t5500: simplify test implementation and fix git exit code suppression (Microproject)
> Status: Merged into master
> Mailing List: https://lore.kernel.org/git/20260121130012.888299-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: a824421d3644f39bfa8dfc75876db8ed1c7bcdbf
> Log: This was completed as a microproject for GSoC. Instead of
> constructing the pack protocol using a complex combination
> of here-docs and echo commands, the patch captures command
> outputs beforehand and uses the test-tool pkt-line pack
> helper to construct the protocol input in a temporary file
> before feeding it to git upload-pack.
>
> * show-index: add warning and wrap error messages with gettext
> Status: Merged into master
> Mailing List: https://lore.kernel.org/git/20260130153603.290196-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: ea39808a22714b8f61b9472de7ef467ced15efea,
> 227e2cc4e1415c4aeadceef527dd33e478ad5ec3
> Log: While exploring the code, I noticed a TODO comment suggesting
> automatic hash detection. After discussion on the mailing
> list, it was concluded that there was no future-proof
> approach to implement this until a new index file format
> came into use. Instead, an explicit warning was added rather
> than silently falling back to SHA-1. Additionally, several
> error messages were missing gettext wrapping, which was also
> fixed.
>
> * wt-status: reduce reliance on global state
> Status: Merged into seen
When a patch series isn't yet merged into next, it's better to tell
what's its status in Junio's latest "What's cooking in git.git ..."
email. For this one, it looks like it is "Will merge to 'next'.".
> Mailing List: https://lore.kernel.org/git/20260218175654.66004-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: a7cd24de0b3b679c16ae3ee8215af06aeea1e6a3,
> 9d0d2ba217f3ceefb0315b556f012edb598b9724,
> 4631e22f925fa2af8d8548af97ee2215be101409
> Log: This has been the most significant patch series in my journey
> so far. It began with a suggestion from Phillip to clean up
> some the_repository usages in wt-status.c. I extended the
> effort to remove all usages of the_repository and
> the_hash_algo from the file. During review discussions, it
> was suggested that some worktree API cleanup should happen
> first, particularly regarding the representation of worktrees
> as NULL. Some related changes were later moved to a separate
> series, after which this refactoring proceeded.
>
> * worktree: change representation and usage of primary worktree
> Status: Continued by Phillip Wood [1]
Here you can also say that they have been merged into master. Maybe:
"Status: Merged into master after being continued by Phillip Wood"
> Mailing List: https://lore.kernel.org/git/20260213120529.15475-1-shreyanshpaliwalcmsmn@gmail.com/
> Log: This worktree API cleanup series started while I was working
> on wt-status. The intention was to modify the representation
> of the current worktree so that struct worktree would not be
> NULL. During discussion, Phillip clarified that NULL actually
> represents the current worktree rather than the primary
> worktree. Since Phillip already had a patch based on the right
> logic, he continued the series and it was eventually merged
> into master.
>
> * tree-diff: remove the usage of the_hash_algo global
> Status: Merged into master
> Mailing List: https://lore.kernel.org/git/20260220175331.1250726-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: 1e50d839f8592daf364778298a61670c4b998654
> Log: This was a straightforward patch that removed the remaining
> usages of the global the_hash_algo in tree-diff.c by using the
> repository’s local instance instead.
>
> * send-email: UTF-8 encoding in subject line
> Status: Merged into seen
> Mailing List: https://lore.kernel.org/git/20260228112210.270273-1-shreyanshpaliwalcmsmn@gmail.com/
> Merge Commit: c52f085a477c8eece87821c5bbc035e5a900eb12
> Log: This patch was motivated by an issue I personally encountered
> while sending a GSoC discussion email [2]. Initially the
> change only modified the wording of the prompt, but after
> discussion on the mailing list it was extended to include
> proper validation to prevent invalid charset encodings from
> being used in git send-email and to reduce confusion.
>
> * Remove global state from editor.c
> Status: Waiting for further feedback
> Mailing List: https://lore.kernel.org/git/20260301105228.1738388-1-shreyanshpaliwalcmsmn@gmail.com/
> Log: This was based on my doubt on localizing editor_program in
> editor.c [2]. The patch received mixed feedback from
> contributors and is currently awaiting additional guidance
> from mentor and/or maintainer regarding the appropriate
> direction.
>
> Patches for git.github.io:
> --------------------------
>
> * SoC-2026-ideas: Remove an extra backtick
> Status: merged into master
> PR Link: https://github.com/git/git.github.io/pull/831
> Merge Commit: c1e4aa87a54430953eaa7355061139fdf1ff6796
> Log: Minor Typo fix.
>
> * rn-132: fixed 2 typos
> Status: merged into master
> PR Link: https://github.com/git/git.github.io/pull/832
> Merge Commit: 92876114d855d472ce2e0e5337e72a4b97b81681
> Log: Fixed typos in Git Rev News Edition 132.
>
> I have also been involved in additional discussions on the Git mailing
> list [3][4][5][6].
[...]
> Project Timeline:
> ----------------
>
> * Community Bonding (Until May 24):
> - Discuss the project direction and design approaches with mentors.
> - Identify and prioritize two main areas of work:
> + files that rely on the_repository.
> + global variables defined in environment.c.
> - Study the previous patches by Olamide Bello and Ayush in depth and
> also discuss with them about their approaches and challenges.
> - Interact with all the people involved in this work to better
> understand design decisions and potential pitfalls.
> - Experiment with small RFC patches, if needed to validate approaches.
>
> * Coding period (May 25 - August 16):
> - Review the work done by Olamide Bello on moving values parsed by
> git_default_config() into the repo_config_values structure and
> identify any remaining tasks.
I think this should be part of the Community Bonding period.
> - Complete remaining cleanup or refactoring related to the worktree API,
> if left any [19].
> - Identify straightforward refactors to remove usages of the_repository
> in files such as xdiff-interface.c, archive*.c, fsmonitor*.c etc.
> - Work file by file with the goal of eliminating
> #define USE_THE_REPOSITORY_VARIABLE by replacing global usages
> with explicit repository instances.
> - Concurrently maintain at least two parallel patch series:
> + Small / straightforward refactors and replacements like
> the_hash_algo or the_repostitory.
> + Larger structural refactors involving globals such as
> DEFAULT_ABBREV, comment_line_str etc.
> - Publish weekly or biweekly blog updates documenting progress and design
> decisions.
>
> * Final week (august 17 - august 24):
> - Address any remaining tasks or pending patches.
> - Recieve final feedback from mentors and reviewers.
s/Recieve/Receive/
> - Prepare a detailed report summarizing the work completed during the project.
Thanks for your proposal!
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state
2026-03-07 10:33 ` Christian Couder
@ 2026-03-07 12:46 ` Shreyansh Paliwal
0 siblings, 0 replies; 15+ messages in thread
From: Shreyansh Paliwal @ 2026-03-07 12:46 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
> Hi Shreyansh,
>
> On Fri, Mar 6, 2026 at 4:16 PM Shreyansh Paliwal
> <shreyanshpaliwalcmsmn@gmail•com> wrote:
> >
> > Hello all,
> >
> > This is my first draft of GSoC 2026 proposal for the project
> > 'Refactoring in order to reduce Git’s global state'.
>
> Thanks for your interest in Git.
>
> > I am Shreyansh Paliwal, a pre-final year undergraduate student at Guru
> > Gobind Singh Indraprastha University, New Delhi, India. I am a technology
> > enthusiast, who began programming in 2018 with Java as my first language
> > and later transitioned to C/C++ in 2023 as my primary focus. I enjoy
> > exploring new technologies and programming languages, and I have developed
> > solid experience building applications using TypeScript, React.js, Node.js,
> > and AWS. I actively participate in technical events and have organized
> > multiple hackathons, tech-fests, and related activities at my college as
> > the SIG-Head of IOSD, a tech-focused student community.
>
> Interesting. Do you have links about these?
Yup, I can gather some related links for these, will add them.
>
> > Pre-GSOC:
> > ---------
>
> > During this process, I attempted to remove the usage of the_repository from
> > a file. However, after discussion on the mailing list, Phillip pointed out
> > that the change was not particularly useful in that context and could
> > introduce segfaults that would not justify the effort for builtin code.
> > Based on this feedback, I dropped that attempt and instead focused on
> > understanding the broader global state refactoring effort. To better
> > understand the project area, I studied previous patches and blog posts by
> > Ayush Chandekar and Olamide Bello, followed discussions on the mailing
> > list, and explored parts of the codebase such as the wt-status and worktree
> > subsystems. This helped me understand the ongoing effort to reduce Git’s
> > reliance on global state and motivated me to work further in this area.
> >
> > The following is a list of my contributions, ordered from earliest to most
> > recent:
> >
> > Patches for Git:
> > ----------------
> >
> > * test-lib-functions.sh: fix test_grep fail message wording
> > Status: Merged into master
>
> The status should be "Released as part of v2.43.1" or something like
> that as far as I can see.
Right, got it.
> > Mailing List: https://lore.kernel.org/git/20231203171956.771-1-shreyanshpaliwalcmsmn@gmail.com/
> > Merge Commit: 37e8d795bed7b93d3f12bcdd3fbb86dfe57921e6
>
> If you say "Merge Commit" we expect the commit that merged your work.
> It looks like this commit contains your work, so I think it's better
> to just say "Commit" instead.
>
Understood. I will change "Merge Commit" to "Commit" for all the patches.
> > Log: This was my first patch to Git in 2023. While browsing the
> > source code and past issues, I noticed that even after
> > the test_i18ngrep function was deprecated, an error message
> > referring to test_grep was left behind. I updated the
> > wording to correctly reference test_i18ngrep.
>
> I think it should be something like:
>
> ... even after the test_i18ngrep function was deprecated, an error
> message referring to test_i18ngrep was left behind. I updated the
> wording to correctly reference test_grep.
>
Oops, I'll fix the wording.
> > * doc: MyFirstContribution: fix missing dependencies and clarify build steps
> > Status: Merged into master
> > Mailing List: https://lore.kernel.org/git/20260112195625.391821-1-shreyanshpaliwalcmsmn@gmail.com/
> > Merge Commit: 81021871eaa8b16a892b9c8791a0c905ab26e342
>
> Same thing about "Merge Commit" vs "Commit". Below too.
>
> > Log: While getting familiar with the codebase, I followed the
> > MyFirstContribution documentation and encountered a few
> > issues. Some include headers were missing, the synopsis
> > format was incorrect, and the explanation for -j$(nproc)
> > was absent. I submitted fixes to improve the clarity and
> > correctness of the documentation.
> >
> > * t5500: simplify test implementation and fix git exit code suppression (Microproject)
> > Status: Merged into master
> > Mailing List: https://lore.kernel.org/git/20260121130012.888299-1-shreyanshpaliwalcmsmn@gmail.com/
> > Merge Commit: a824421d3644f39bfa8dfc75876db8ed1c7bcdbf
> > Log: This was completed as a microproject for GSoC. Instead of
> > constructing the pack protocol using a complex combination
> > of here-docs and echo commands, the patch captures command
> > outputs beforehand and uses the test-tool pkt-line pack
> > helper to construct the protocol input in a temporary file
> > before feeding it to git upload-pack.
> >
> > * show-index: add warning and wrap error messages with gettext
> > Status: Merged into master
> > Mailing List: https://lore.kernel.org/git/20260130153603.290196-1-shreyanshpaliwalcmsmn@gmail.com/
> > Merge Commit: ea39808a22714b8f61b9472de7ef467ced15efea,
> > 227e2cc4e1415c4aeadceef527dd33e478ad5ec3
> > Log: While exploring the code, I noticed a TODO comment suggesting
> > automatic hash detection. After discussion on the mailing
> > list, it was concluded that there was no future-proof
> > approach to implement this until a new index file format
> > came into use. Instead, an explicit warning was added rather
> > than silently falling back to SHA-1. Additionally, several
> > error messages were missing gettext wrapping, which was also
> > fixed.
> >
> > * wt-status: reduce reliance on global state
> > Status: Merged into seen
>
> When a patch series isn't yet merged into next, it's better to tell
> what's its status in Junio's latest "What's cooking in git.git ..."
> email. For this one, it looks like it is "Will merge to 'next'.".
>
Yes, merging to next was just confirmed in the latest Mar 2026 #03,
before this it was still with a question mark and pending for any comments.
I will update the status, including for the send-email patch.
> > Mailing List: https://lore.kernel.org/git/20260218175654.66004-1-shreyanshpaliwalcmsmn@gmail.com/
> > Merge Commit: a7cd24de0b3b679c16ae3ee8215af06aeea1e6a3,
> > 9d0d2ba217f3ceefb0315b556f012edb598b9724,
> > 4631e22f925fa2af8d8548af97ee2215be101409
> > Log: This has been the most significant patch series in my journey
> > so far. It began with a suggestion from Phillip to clean up
> > some the_repository usages in wt-status.c. I extended the
> > effort to remove all usages of the_repository and
> > the_hash_algo from the file. During review discussions, it
> > was suggested that some worktree API cleanup should happen
> > first, particularly regarding the representation of worktrees
> > as NULL. Some related changes were later moved to a separate
> > series, after which this refactoring proceeded.
> >
> > * worktree: change representation and usage of primary worktree
> > Status: Continued by Phillip Wood [1]
>
> Here you can also say that they have been merged into master. Maybe:
> "Status: Merged into master after being continued by Phillip Wood"
>
Makes sense. I'll update this.
> > Mailing List: https://lore.kernel.org/git/20260213120529.15475-1-shreyanshpaliwalcmsmn@gmail.com/
> > Log: This worktree API cleanup series started while I was working
> > on wt-status. The intention was to modify the representation
> > of the current worktree so that struct worktree would not be
> > NULL. During discussion, Phillip clarified that NULL actually
> > represents the current worktree rather than the primary
> > worktree. Since Phillip already had a patch based on the right
> > logic, he continued the series and it was eventually merged
> > into master.
[...]
> >
> > * Coding period (May 25 - August 16):
> > - Review the work done by Olamide Bello on moving values parsed by
> > git_default_config() into the repo_config_values structure and
> > identify any remaining tasks.
>
> I think this should be part of the Community Bonding period.
>
> > - Complete remaining cleanup or refactoring related to the worktree API,
> > if left any [19].
> > - Identify straightforward refactors to remove usages of the_repository
> > in files such as xdiff-interface.c, archive*.c, fsmonitor*.c etc.
> > - Work file by file with the goal of eliminating
> > #define USE_THE_REPOSITORY_VARIABLE by replacing global usages
> > with explicit repository instances.
> > - Concurrently maintain at least two parallel patch series:
> > + Small / straightforward refactors and replacements like
> > the_hash_algo or the_repostitory.
> > + Larger structural refactors involving globals such as
> > DEFAULT_ABBREV, comment_line_str etc.
> > - Publish weekly or biweekly blog updates documenting progress and design
> > decisions.
> >
> > * Final week (august 17 - august 24):
> > - Address any remaining tasks or pending patches.
> > - Recieve final feedback from mentors and reviewers.
>
> s/Recieve/Receive/
>
> > - Prepare a detailed report summarizing the work completed during the project.
>
>
> Thanks for your proposal!
Thanks Christian, for reading and for the suggestions, I'll revise
and send an updated version on this.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [GSOC][PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2026-03-06 14:57 [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state Shreyansh Paliwal
2026-03-07 10:33 ` Christian Couder
@ 2026-03-07 20:04 ` Shreyansh Paliwal
2026-03-09 14:42 ` Christian Couder
2026-03-20 18:12 ` [GSOC][PROPOSAL v3]: " Shreyansh Paliwal
1 sibling, 2 replies; 15+ messages in thread
From: Shreyansh Paliwal @ 2026-03-07 20:04 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
Hello,
This is my second draft of GSoC 2026 proposal for the project
'Refactoring in order to reduce Git’s global state'.
Doc version can be read at:
https://docs.google.com/document/d/16MRNUv6dJi6vtNvI5Ro0WmHf20dRRBHjFLpmhAuaUOA/edit?usp=sharing
Any feedback or suggestions would be greatly appreciated.
Thanks for reading.
---
Changes in v2:
- Added links in the 'About Me' section and updated reference numbering.
- Rephrased and revised the 'Pre-GSoC', 'History' and 'Proposed Plan' sections.
- Updated patch statuses and changed some wordings.
---
Refactoring in order to reduce Git's global state
Personal Information:
---------------------
Name: Shreyansh Paliwal
Email: Shreyanshpaliwalcmsmn@gmail•com
Alternate Email: Shreyansh.01014803123@it•mait.ac.in
Mobile No.: +91-9335120023
Education: GGSIPU, New Delhi, India
Year: III / IV
Degree: Bachelor of Technology in Information Technology
Github: https://github.com/shreyp135
Time-zone: UTC +5:30 (IST)
About Me:
---------
I am Shreyansh Paliwal, a pre-final year undergraduate student at Guru
Gobind Singh Indraprastha University, New Delhi, India. I am a technology
enthusiast, who began programming in 2018 with Java as my first language
and later transitioned to C/C++ in 2023 as my primary focus. I enjoy
exploring new technologies and programming languages, and have developed
solid experience building applications such as [1] using TypeScript,
React.js, Node.js, and AWS. I actively participate in technical events and
have organized multiple hackathons [2], tech-fests [3], and related
activities at my college as the SIG-Head of IOSD [4], a tech-focused
student community.
I started using Git in 2023, which is also when I made my first open-source
contribution to the Git project. I was a winner of Augtoberfest 2024 [5],
an open-source competition organized by C4GT India. Over the past several
months, I have been involved with the Git project, studying the codebase,
submitting patches, and incorporating review feedback. I am motivated to
improve the experience of Git for end users, and this project is an
excellent opportunity to continue that work.
Overview:
---------
Git relies heavily on global state for managing environment variables and
configuration data. In particular, many parts of the codebase depend on the
global struct repository instance, the_repository, which represents the
currently active repository. Instead of passing a repository instance
explicitly, several internal functions implicitly rely on this global
object. Additionally, various configuration derived values and
environment-related variables such as the_hash_algo, default_abbrev, and
comment_line_str are stored globally, most of them defined in
environment.c.
This design assumes that only one repository is active within a process at
a time. As a result, the repository state becomes shared across the entire
process, weakening isolation and making behavior implicitly dependent on
global context. Such global dependencies make the code harder to reason
about, test, and maintain, and can introduce subtle bugs when operations
interact with multiple repositories. They also limit long-term goals such
as safely supporting multiple repositories within a single process and
continuing Git’s ongoing libification efforts.
To address these issues, global environment and configuration state should
be refactored into better-scoped contexts. Repository-specific data can be
moved into struct repository or related structures, while
subsystem-specific state should be localized appropriately. Passing
repository instances explicitly through function interfaces will improve
modularity, reduce hidden dependencies, and make the codebase easier to
maintain while moving Git closer to supporting multiple repositories safely
within a single process.
The difficulty of this project is medium, and it is estimated to take 175
to 350 hours.
Pre-GSOC:
---------
I first explored the Git codebase in December 2023, when I submitted a
small patch fixing the wording of an error message that I noticed while
browsing the source code. At that time I had recently started using Git and
GitHub for version control in my projects, which sparked my curiosity about
how Git works internally. A few months ago, when I had some free time
from college, I decided to start contributing to Git more actively. I built
Git from source, read parts of the documentation, and familiarized myself
with the mailing list workflow. While going through the documentation, I
noticed a few inconsistencies in the MyFirstContribution page and submitted
patches to fix them. I also completed a microproject involving a test
cleanup, and later worked on adding a warning for a quiet fallback.
During this process, I attempted to remove the usage of the_repository from
a file. After discussion on the mailing list [23], Phillip directed me
towards wt-status, which led me to explore parts of the codebase such as
the wt-status and worktree subsystems. Through this, I learned that such
refactors are generally more valuable in core library code. Following this
discussion, I shifted my focus toward understanding the broader global
state refactoring effort. To better understand the project area, I studied
previous patches and blog posts by Ayush Chandekar and Olamide Bello,
followed related discussions on the mailing list, and explored the relevant
parts of the codebase. This motivated me to work further in this area and
shaped my interest in this project.
The following is a list of my contributions, ordered from earliest to most
recent:
Patches for Git:
----------------
* test-lib-functions.sh: fix test_grep fail message wording
Status: Released in v2.43.1
Mailing List: https://lore.kernel.org/git/20231203171956.771-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 37e8d795bed7b93d3f12bcdd3fbb86dfe57921e6
Log: This was my first patch to Git in 2023. While browsing the
source code and past issues, I noticed that even after
the test_i18ngrep function was deprecated, an error message
referring to test_i18ngrep was left behind. I updated
the wording to correctly reference test_grep.
* doc: MyFirstContribution: fix missing dependencies and clarify build steps
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260112195625.391821-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 81021871eaa8b16a892b9c8791a0c905ab26e342
Log: While getting familiar with the codebase, I followed the
MyFirstContribution documentation and encountered a few
issues. Some include headers were missing, the synopsis
format was incorrect, and the explanation for -j$(nproc)
was absent. I submitted fixes to improve the clarity and
correctness of the documentation.
* t5500: simplify test implementation and fix git exit code suppression (Microproject)
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260121130012.888299-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: a824421d3644f39bfa8dfc75876db8ed1c7bcdbf
Log: This was completed as a microproject for GSoC. Instead of
constructing the pack protocol using a complex combination
of here-docs and echo commands, the patch captures command
outputs beforehand and uses the test-tool pkt-line pack
helper to construct the protocol input in a temporary file
before feeding it to git upload-pack.
* show-index: add warning and wrap error messages with gettext
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260130153603.290196-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: ea39808a22714b8f61b9472de7ef467ced15efea,
227e2cc4e1415c4aeadceef527dd33e478ad5ec3
Log: While exploring the code, I noticed a TODO comment suggesting
automatic hash detection. After discussion on the mailing
list, it was concluded that there was no future-proof
approach to implement this until a new index file format
came into use. Instead, an explicit warning was added rather
than silently falling back to SHA-1. Additionally, several
error messages were missing gettext wrapping, which was also
fixed.
* wt-status: reduce reliance on global state
Status: Will merge to next
Mailing List: https://lore.kernel.org/git/20260218175654.66004-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: a7cd24de0b3b679c16ae3ee8215af06aeea1e6a3,
9d0d2ba217f3ceefb0315b556f012edb598b9724,
4631e22f925fa2af8d8548af97ee2215be101409
Log: This has been the most significant patch series in my journey
so far. It began with a suggestion from Phillip to clean up
some the_repository usages in wt-status.c. I extended the
effort to remove all usages of the_repository and
the_hash_algo from the file. During review discussions, it
was suggested that some worktree API cleanup should happen
first, particularly regarding the representation of worktrees
as NULL. Some related changes were later moved to a separate
series, after which this refactoring proceeded.
* worktree: change representation and usage of primary worktree
Status: Merged into master after being continued by Phillip Wood [6]
Mailing List: https://lore.kernel.org/git/20260213120529.15475-1-shreyanshpaliwalcmsmn@gmail.com/
Log: This worktree API cleanup series started while I was working
on wt-status. The intention was to modify the representation
of the current worktree so that struct worktree would not be
NULL. During discussion, Phillip clarified that NULL actually
represents the current worktree rather than the primary
worktree. Since Phillip already had a patch based on the right
logic, he continued the series and it was eventually merged
into master.
* tree-diff: remove the usage of the_hash_algo global
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260220175331.1250726-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 1e50d839f8592daf364778298a61670c4b998654
Log: This was a straightforward patch that removed the remaining
usages of the global the_hash_algo in tree-diff.c by using the
repository’s local instance instead.
* send-email: UTF-8 encoding in subject line
Status: Will merge to master
Mailing List: https://lore.kernel.org/git/20260228112210.270273-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: c52f085a477c8eece87821c5bbc035e5a900eb12
Log: This patch was motivated by an issue I personally encountered
while sending a GSoC discussion email [7]. Initially the
change only modified the wording of the prompt, but after
discussion on the mailing list it was extended to include
proper validation to prevent invalid charset encodings from
being used in git send-email and to reduce confusion.
* Remove global state from editor.c
Status: Waiting for further feedback
Mailing List: https://lore.kernel.org/git/20260301105228.1738388-1-shreyanshpaliwalcmsmn@gmail.com/
Log: This originated from a question I had about localizing
editor_program in editor.c [7]. The patch received some
mixed feedback on whether editor_program state should
instead become repository-scoped, since it can also be set
via git config --local. I am currently awaiting further
guidance from mentors on the appropriate direction.
Patches for git.github.io:
--------------------------
* SoC-2026-ideas: Remove an extra backtick
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/831
Merge Commit: c1e4aa87a54430953eaa7355061139fdf1ff6796
Log: Minor Typo fix.
* rn-132: fixed 2 typos
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/832
Merge Commit: 92876114d855d472ce2e0e5337e72a4b97b81681
Log: Fixed typos in Git Rev News Edition 132.
I have also been involved in additional discussions on the Git mailing
list [8][9][10][11].
History / Background:
--------------------
Efforts to reduce Git’s reliance on global state began as several
subsystems moved toward libification, enabling Git’s internal functionality
to be reused as a library. Early examples include the libification of git
mailinfo by Junio [12] and git apply by Christian [13], these large patch
series exposed the limitations of relying on global state and highlighted
the need for better encapsulation of repository-related data. A key step
was the introduction of struct repository through refactoring by Stefan
Beller [14] and Brandon Williams [15], which was motivated to centralize
repository-related state instead of relying on scattered global variables,
improving code clarity while laying groundwork for future improvements such
as safer multithreading and handling submodules in the same process. Later
work by Patrick further reduced reliance on the global the_repository in
the config [16] and path [17] subsystems, consolidating several variables
into environment.c so environment-related state could be managed in one
place [18]. The macro #define USE_THE_REPOSITORY_VARIABLE was also
introduced to help transition code away from implicit global repository
access [19].
During GSoC 2025, Ayush Chandekar [20] removed additional usages of
the_repository across the codebase and moved several global configuration
variables (such as core_preload_index and merge_log_config) into
repository-scoped structures. More recently, during Outreachy, Olamide
Bello improved configuration handling by introducing repo_config_values, a
structure linked to struct repository that stores repository-specific
configuration values [21][22]. A supporting private structure,
config_values_private, was added for initialization and internal handling.
Discussions around this work also highlighted an important design
constraint: directly moving globals into repository structures or
introducing lazy loading helpers can cause user experience regressions if
configuration errors are detected later.
These efforts collectively form the foundation of the ongoing work to
gradually remove Git’s reliance on global state and move toward a more
modular, repository-scoped architecture.
Proposed Plan:
-------------
I started exploring the codebase by browsing relevant files and identifying
global variables by temporarily removing the USE_THE_REPOSITORY_VARIABLE
macro. My primary focus was on core library files rather than builtin code
[23]. Through this exploration, I observed that a large number of files still
depend on the_repository.
To tackle this project systematically, I propose classifying these files into
two categories:
1. Files using the_repository or the_hash_algo where a repository
instance already exists: These files rely on global variables even
though a struct repository instance is available somewhere in the
call stack. A simple example is my patch in tree-diff.c, where a
repository instance was already available through struct diff_options
*opt, but the_hash_algo was still used. I replaced it with
opt->repo->hash_algo.
In such cases, the refactor mainly involves passing the repository
instance through the function call stack and replacing the global
usages. If a repository instance is not directly available in the
file, I will trace the callers and propagate it from higher levels in
the call hierarchy.
Examples of such files include alias.c, archive*.c, walker.c, and
xdiff-interface.c. These typically require localized refactoring and
are good candidates for incremental patches.
2. Files relying on other global variables defined in environment.c:
Some files depend on additional global variables that are parsed and
accessed through environment.c. In these cases, there is no existing
repository-scoped instance, making the refactor slightly more involved.
Examples include wt-status.c (default_abbrev, comment_line_str) and
apply.c (has_symlink, ignore_case, trust_executable_bit,
apply_default_whitespace, apply_default_ignorewhitespace).
For such variables, I will evaluate whether they should be moved into
repository-scoped structures (e.g., repo_settings or
repo_config_values), or instead be localized and passed explicitly
where needed. The appropriate approach will depend on how widely the
variable is used and whether it logically belongs in a
multi-repository standpoint.
I plan to begin with the first category, addressing straightforward
refactors file by file. In parallel, I will analyze and work on specific
groups of global variables from the second category, designing
appropriate repository-scoped replacements while preserving the
original parsing timing and availability of those variables.
The end goal is to remove reliance on global state and eventually eliminate
the USE_THE_REPOSITORY_VARIABLE macro from these files.
Project Timeline:
----------------
* Community Bonding (Until May 24):
- Discuss the project direction and design approaches with mentors.
- Identify and prioritize two main areas of work:
+ files that rely on the_repository.
+ global variables defined in environment.c.
- Study the previous patches by Olamide Bello and Ayush Chandekar
in depth, and identify any remaining tasks while discussing
their approaches and challenges with them.
- Interact with all the people involved in this work to better
understand design decisions and potential pitfalls.
- Experiment with small RFC patches, if needed to validate approaches.
* Coding period (May 25 - August 16):
- Send patches for any remaining cleanup or refactoring related to
git_default_config() and repo_config_values [22], as well as
the worktree API [24], if any.
- Identify straightforward refactors to remove usages of the_repository
in files such as xdiff-interface.c, archive*.c, fsmonitor*.c etc.
- Work file by file with the goal of eliminating
#define USE_THE_REPOSITORY_VARIABLE by replacing global usages
with explicit repository instances.
- Concurrently maintain at least two parallel patch series:
+ Small / straightforward refactors and replacements like
the_hash_algo or the_repostitory.
+ Larger structural refactors involving globals such as
DEFAULT_ABBREV, comment_line_str etc.
- Publish weekly or biweekly blog updates documenting progress and design
decisions.
* Final week (August 17 - August 24):
- Address any remaining tasks or pending patches.
- Receive final feedback from mentors and reviewers.
- Prepare a detailed report summarizing the work completed during the project.
Blogging:
---------
I believe blogging is an important part of any open-source project. It
helps others understand the ongoing work and also enables the contributor
to develop a deeper understanding and keep a better track of their own
progress. I experienced this firsthand, early in my journey I was unsure
about various aspects, but reading the blogs of Ayush and Olamide Bello
gave me valuable insight into the contributor perspective and their overall
work.
With the goal of helping future contributors in a similar way, I plan to
document my journey and project progress through regular blog posts. I will
publish updates on a weekly or biweekly basis, depending on the amount of
meaningful progress made. I have set up my blogging area on Medium, and my
posts will be available at [25].
Availability:
-------------
The main coding period runs from June to August. Most of June and July
coincide with my summer vacation, which allows me to dedicate significant
time to the project. My final exams are scheduled for May and will last
approximately one week, but they will be completed before the coding period
begins and should not affect my availability.
During June and July, I will be able to dedicate around 40 hours per week to
the project. In August, when my regular semester resumes, I expect to
contribute approximately 25–30 hours per week.
I do not have any other exams, internships, or planned vacations during the
coding period. Apart from this project, I have no other major commitments
for the summer.
I will keep the community regularly updated on my progress throughout the
project. My primary mode of communication will be email, and I will also be
available for calls or meetings if/when required. My preferred availability
window is 13:00–19:00 UTC.
Post GSoC:
----------
Being part of the Git community and contributing to the codebase has been a
very valuable experience for me. The process of understanding Git’s internals,
submitting patches, and receiving feedback on the mailing list has helped me
grow significantly as a developer. The feeling of working on code that is used
by millions of developers and companies around the world is very rewarding.
I plan to remain involved with the Git community even after GSoC by continuing
to contribute patches, review code, and participate in discussions to help make
Git better for end users. The work on refactoring Git’s global state is part of
a long-term effort, and I would love to continue working on it beyond the GSoC
timeline.
I would also be happy to mentor, co-mentor, or volunteer in the future to help
new and upcoming contributors whenever I get the chance. I see GSoC as the
starting point of a long-term relationship with the Git community.
Closing & Appreciation:
-----------------------
I would like to thank the Git community for the excellent documentation and the
welcoming environment. I am also grateful for the patience and guidance shown
in the feedback and discussions on the mailing list by Junio, Phillip, Karthik,
Ben, and others, which have helped me improve my understanding and contributions.
I also read blogs and proposals by Ayush, Lucas, Kousik Sanagavarapu, and Olamide
Bello, which provided valuable insights and helped shape my approach to contributing.
Thank you for reviewing my proposal :)
References:
-----------
[1]- https://github.com/shreyp135/Alethea
[2]- https://unstop.com/hackathons/hackmait-50-iosd-impulse-2024-maharaja-agrasen-institute-of-technology-mait-new-delhi-941779
[3]- https://cse.mait.ac.in/index.php/academics/9-computer-center/1249-iosd-mait-impulse-25, https://unstop.com/college-fests/impulse-2025-maharaja-agrasen-institute-of-technology-mait-new-delhi-348321
[4]- https://iosd-web.vercel.app/
[5]- https://www.linkedin.com/posts/code-for-goodtech_augtoberfest-c4gt2024-activity-7242923677032312834-XMul
[6]- https://lore.kernel.org/git/cover.1771511192.git.phillip.wood@dunelm.org.uk/
[7]- https://lore.kernel.org/git/20260304145823.189440-1-shreyanshpaliwalcmsmn@gmail.com/T/#m65b9b4547036991a7b7f3c861b9663428891f588
[8]- https://lore.kernel.org/git/20260114143238.536312-1-shreyanshpaliwalcmsmn@gmail.com/
[9]- https://lore.kernel.org/git/20260115211609.17420-1-shreyanshpaliwalcmsmn@gmail.com/
[10]- https://lore.kernel.org/git/20260204111343.71975-1-shreyanshpaliwalcmsmn@gmail.com/
[11]- https://lore.kernel.org/git/20260205131132.44282-1-shreyanshpaliwalcmsmn@gmail.com/
[12]- https://lore.kernel.org/git/1444778207-859-1-git-send-email-gitster@pobox.com/
[13]- https://lore.kernel.org/git/20160511131745.2914-1-chriscool@tuxfamily.org/
[14]- https://lore.kernel.org/git/20180205235508.216277-1-sbeller@google.com/
[15]- https://lore.kernel.org/git/20170531214417.38857-1-bmwill@google.com/
[16]- https://lore.kernel.org/git/cover.1715339393.git.ps@pks.im/
[17]- https://lore.kernel.org/git/20250206-b4-pks-path-drop-the-repository-v1-16-4e77f0313206@pks.im/
[18]- https://lore.kernel.org/git/20250717-pks-config-wo-the-repository-v1-20-d888e4a17de1@pks.im/
[19]- https://lore.kernel.org/git/cover.1718347699.git.ps@pks.im/
[20]- https://ayu-ch.github.io/2025/08/29/gsoc-final-report.html
[21]- https://cloobtech.hashnode.dev/week-5-and-6-design-reviews-rfcs-and-refining-the-path-forward
[22]- https://lore.kernel.org/all/cover.1771258573.git.belkid98@gmail.com/
[23]- https://lore.kernel.org/git/7b5dd0c4-0ca0-458e-89db-621a70dac9ae@gmail.com/
[24]- https://lore.kernel.org/git/20260217163909.55094-1-shreyanshpaliwalcmsmn@gmail.com/
[25]- https://medium.com/@shreyanshpaliwal18
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC][PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2026-03-07 20:04 ` [GSOC][PROPOSAL v2]: " Shreyansh Paliwal
@ 2026-03-09 14:42 ` Christian Couder
2026-03-10 14:58 ` Shreyansh Paliwal
2026-03-20 18:12 ` [GSOC][PROPOSAL v3]: " Shreyansh Paliwal
1 sibling, 1 reply; 15+ messages in thread
From: Christian Couder @ 2026-03-09 14:42 UTC (permalink / raw)
To: Shreyansh Paliwal
Cc: git, karthik.188, jltobler, ayu.chandekar, siddharthasthana31
Hi Shreyansh,
On Sat, Mar 7, 2026 at 9:09 PM Shreyansh Paliwal
<shreyanshpaliwalcmsmn@gmail•com> wrote:
> Changes in v2:
> - Added links in the 'About Me' section and updated reference numbering.
> - Rephrased and revised the 'Pre-GSoC', 'History' and 'Proposed Plan' sections.
> - Updated patch statuses and changed some wordings.
Thanks. Your proposal looks good to me now.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSOC][PROPOSAL v2]: Refactoring in order to reduce Git’s global state
2026-03-09 14:42 ` Christian Couder
@ 2026-03-10 14:58 ` Shreyansh Paliwal
0 siblings, 0 replies; 15+ messages in thread
From: Shreyansh Paliwal @ 2026-03-10 14:58 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
> Hi Shreyansh,
>
> On Sat, Mar 7, 2026 at 9:09 PM Shreyansh Paliwal
> <shreyanshpaliwalcmsmn@gmail•com> wrote:
>
> > Changes in v2:
> > - Added links in the 'About Me' section and updated reference numbering.
> > - Rephrased and revised the 'Pre-GSoC', 'History' and 'Proposed Plan' sections.
> > - Updated patch statuses and changed some wordings.
>
> Thanks. Your proposal looks good to me now.
Thanks Christian for taking the time to review it.
If there are any updates to the patch list or the proposal content,
I will send a v3 in a few days, before the final submission.
Best,
Shreyansh
^ permalink raw reply [flat|nested] 15+ messages in thread
* [GSOC][PROPOSAL v3]: Refactoring in order to reduce Git’s global state
2026-03-07 20:04 ` [GSOC][PROPOSAL v2]: " Shreyansh Paliwal
2026-03-09 14:42 ` Christian Couder
@ 2026-03-20 18:12 ` Shreyansh Paliwal
1 sibling, 0 replies; 15+ messages in thread
From: Shreyansh Paliwal @ 2026-03-20 18:12 UTC (permalink / raw)
To: git
Cc: christian.couder, karthik.188, jltobler, ayu.chandekar,
siddharthasthana31
Hello,
This is my third draft of GSoC 2026 proposal for the project
'Refactoring in order to reduce Git’s global state'.
Doc version can be read at:
https://docs.google.com/document/d/16MRNUv6dJi6vtNvI5Ro0WmHf20dRRBHjFLpmhAuaUOA/edit?usp=sharing
I have also uploaded this draft to the GSoC website. Any
final feedback or suggestions would be greatly appreciated.
Thanks for reading.
---
Changes in v3:
- Updated patch list and their statuses.
- Minor wording and grammar changes.
---
Refactoring in order to reduce Git's global state
Personal Information:
---------------------
Name: Shreyansh Paliwal
Email: Shreyanshpaliwalcmsmn@gmail•com
Alternate Email: Shreyansh.01014803123@it•mait.ac.in
Mobile No.: +91-9335120023
Education: GGSIPU, New Delhi, India
Year: III / IV
Degree: Bachelor of Technology in Information Technology
Github: https://github.com/shreyp135
Time-zone: UTC +5:30 (IST)
About Me:
---------
I am Shreyansh Paliwal, a pre-final year undergraduate student at Guru
Gobind Singh Indraprastha University, New Delhi, India. I began programming
in 2018 with Java as my first language and later transitioned to C/C++ in
2023, and it has been my primary focus since then. I also enjoy exploring
new technologies and building applications such as [1], which I developed
using TypeScript, React.js, and AWS. I have also organized multiple
hackathons and technical fests [2][3] at my college as the SIG-Head of
IOSD [4], a tech-focused student community.
I started using Git in 2023, which is also when I made my first open-source
contribution to the Git project. I was also a winner of Augtoberfest 2024
[5], an open-source competition organized by C4GT India. Over the past
several months, I have been actively contributing to Git by studying the
codebase, becoming familiar with the mailing list workflow, and submitting
multiple patches after incorporating review feedback. I am motivated to
improve the experience of Git for end users, and this project is an
excellent opportunity to continue that work.
Overview:
---------
Git relies heavily on global state for managing environment variables and
configuration data. In particular, many parts of the codebase depend on the
global struct repository instance, the_repository, which represents the
currently active repository. Instead of passing a repository instance
explicitly, several internal functions implicitly rely on this global
object. Additionally, various configuration derived values and
environment-related variables such as the_hash_algo, default_abbrev, and
comment_line_str are stored globally, most of them defined in
environment.c.
This design assumes that only one repository is active within a process at
a time. As a result, the repository state becomes shared across the entire
process, weakening isolation and making behavior implicitly dependent on
global context. Such global dependencies make the code harder to reason
about, test, and maintain, and can introduce subtle bugs when operations
interact with multiple repositories. They also limit long-term goals such
as safely supporting multiple repositories within a single process and
continuing Git’s ongoing libification efforts.
To address these issues, global environment and configuration state should
be refactored into better-scoped contexts. Repository-specific data can be
moved into struct repository or related structures, while
subsystem-specific state should be localized appropriately. Passing
repository instances explicitly through function interfaces will improve
modularity, reduce hidden dependencies, and make the codebase easier to
maintain while moving Git closer to supporting multiple repositories safely
within a single process.
The difficulty of this project is medium, and it is estimated to take 175
to 350 hours.
Pre-GSOC:
---------
I first explored the Git codebase in December 2023, when I submitted a
small patch fixing the wording of an error message that I noticed while
browsing the source code. At that time I had recently started using Git and
GitHub for version control in my projects, which sparked my curiosity about
how Git works internally. A few months ago, when I had some free time
from college, I decided to start contributing to Git more actively. I built
Git from source, read parts of the documentation, and familiarized myself
with the mailing list workflow. While going through the documentation, I
noticed a few inconsistencies in the MyFirstContribution page and submitted
patches to fix them. I also completed a microproject involving a test
cleanup, and later worked on adding a warning for a quiet fallback.
During this process, I attempted to remove the usage of the_repository from
a file. After discussion on the mailing list [23], Phillip directed me
towards wt-status, which led me to explore parts of the codebase such as
the wt-status and worktree subsystems. Through this, I learned that such
refactors are generally more valuable in core library code. Following this
discussion, I shifted my focus toward understanding the broader global
state refactoring effort. To better understand the project area, I studied
previous patches and blog posts by Ayush Chandekar and Olamide Bello,
followed related discussions on the mailing list, and explored the relevant
parts of the codebase. This motivated me to work further in this area and
shaped my interest in this project.
The following is a list of my contributions, ordered from earliest to most
recent:
Patches for Git:
----------------
* test-lib-functions.sh: fix test_grep fail message wording
Status: Released in v2.43.1
Mailing List: https://lore.kernel.org/git/20231203171956.771-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 37e8d795bed7b93d3f12bcdd3fbb86dfe57921e6
Log: This was my first patch to Git in 2023. While browsing the
source code and past issues, I noticed that even after
the test_i18ngrep function was deprecated, an error message
referring to test_i18ngrep was left behind. I updated
the wording to correctly reference test_grep.
* doc: MyFirstContribution: fix missing dependencies and clarify build steps
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260112195625.391821-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 81021871eaa8b16a892b9c8791a0c905ab26e342
Log: While getting familiar with the codebase, I followed the
MyFirstContribution documentation and encountered a few
issues. Some include headers were missing, the synopsis
format was incorrect, and the explanation for -j$(nproc)
was absent. I submitted fixes to improve the clarity and
correctness of the documentation.
* t5500: simplify test implementation and fix git exit code suppression (Microproject)
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260121130012.888299-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: a824421d3644f39bfa8dfc75876db8ed1c7bcdbf
Log: This was completed as a microproject for GSoC. Instead of
constructing the pack protocol using a complex combination
of here-docs and echo commands, the patch captures command
outputs beforehand and uses the test-tool pkt-line pack
helper to construct the protocol input in a temporary file
before feeding it to git upload-pack.
* show-index: add warning and wrap error messages with gettext
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260130153603.290196-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: ea39808a22714b8f61b9472de7ef467ced15efea,
227e2cc4e1415c4aeadceef527dd33e478ad5ec3
Log: While exploring the code, I noticed a TODO comment suggesting
automatic hash detection. After discussion on the mailing
list, it was concluded that there was no future-proof
approach to implement this until a new index file format
came into use. Instead, an explicit warning was added rather
than silently falling back to SHA-1. Additionally, several
error messages were missing gettext wrapping, which was also
fixed.
* wt-status: reduce reliance on global state
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260218175654.66004-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: a7cd24de0b3b679c16ae3ee8215af06aeea1e6a3,
9d0d2ba217f3ceefb0315b556f012edb598b9724,
4631e22f925fa2af8d8548af97ee2215be101409
Log: This has been the most significant patch series in my journey
so far. It began with a suggestion from Phillip to clean up
some the_repository usages in wt-status.c. I extended the
effort to remove all usages of the_repository and
the_hash_algo from the file. During review discussions, it
was suggested that some worktree API cleanup should happen
first, particularly regarding the representation of worktrees
as NULL. Some related changes were later moved to a separate
series, after which this refactoring proceeded.
* worktree: change representation and usage of primary worktree
Status: Merged into master after being continued by Phillip Wood [6]
Mailing List: https://lore.kernel.org/git/20260213120529.15475-1-shreyanshpaliwalcmsmn@gmail.com/
Log: This worktree API cleanup series started while I was working
on wt-status. The intention was to modify the representation
of the current worktree so that struct worktree would not be
NULL. During discussion, Phillip clarified that NULL actually
represents the current worktree rather than the primary
worktree. Since Phillip already had a patch based on the right
logic, he continued the series and it was eventually merged
into master.
* tree-diff: remove the usage of the_hash_algo global
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260220175331.1250726-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 1e50d839f8592daf364778298a61670c4b998654
Log: This was a straightforward patch that removed the remaining
usages of the global the_hash_algo in tree-diff.c by using the
repository’s local instance instead.
* send-email: UTF-8 encoding in subject line
Status: Merged into master
Mailing List: https://lore.kernel.org/git/20260228112210.270273-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: c52f085a477c8eece87821c5bbc035e5a900eb12
Log: This patch was motivated by an issue I personally encountered
while sending a GSoC discussion email [7]. Initially the
change only modified the wording of the prompt, but after
discussion on the mailing list it was extended to include
proper validation to prevent invalid charset encodings from
being used in git send-email and to reduce confusion.
* Remove global state from editor.c
Status: Awaiting further feedback / not yet picked up
Mailing List: https://lore.kernel.org/git/20260310174519.676851-1-shreyanshpaliwalcmsmn@gmail.com/
Log: This originated from a question I had about localizing
editor_program in editor.c [7]. The patch had some
discussion on whether editor_program state should become
repository-scoped, since it can also be set via
git config --local. Though it was approved by Karthik it
has not been picked up by Junio yet and may be awaiting
further review.
* add-patch: use repository instance from add_p_state instead of the_repository
Status: Needs Review
Mailing List: https://lore.kernel.org/git/20260318090546.1213077-1-shreyanshpaliwalcmsmn@gmail.com/
Commit: 3cfe355ca74aae5cf90a4eca73a341732b0eb456
Log: This was also a straightforward change where the_repository
was used instead of local instance of struct repo in add-patch
config structs, but it had some changes that overlapped with a
recent patch by Patrik. So I got to know the proper method of
checking any overlapping changing and how to base your changes
on top of them. I have sent the revised version, it needs to be
replaced in the seen branch.
Patches for git.github.io:
--------------------------
* SoC-2026-ideas: Remove an extra backtick
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/831
Merge Commit: c1e4aa87a54430953eaa7355061139fdf1ff6796
Log: Minor Typo fix.
* rn-132: fixed 2 typos
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/832
Merge Commit: 92876114d855d472ce2e0e5337e72a4b97b81681
Log: Fixed typos in Git Rev News Edition 132.
* Add Outreachy 2026 participant
Status: merged into master
PR Link: https://github.com/git/git.github.io/pull/836
Merge Commit: 519170970ce7cf29661ee2707aa4e0411cbd2dac
Log: Added Bello Caleb Olamide as Outreachy participant.
I have also been involved in additional discussions on the Git mailing
list [8][9][10][11][26].
History / Background:
--------------------
Efforts to reduce Git’s reliance on global state began as several
subsystems moved toward libification, enabling Git’s internal functionality
to be reused as a library. Early examples include the libification of git
mailinfo [12] and git apply [13], these large patch series exposed the
limitations of relying on global state and highlighted the need for better
encapsulation of repository-related data. A key step was the introduction
of struct repository through refactoring by Stefan Beller [14] and Brandon
Williams [15], which was motivated to centralize repository-related state
instead of relying on scattered global variables, improving code clarity
while laying groundwork for future improvements such as safer
multithreading and handling submodules in the same process. Later work by
Patrick further reduced reliance on the global the_repository in the config
[16] and path [17] subsystems, consolidating several variables into
environment.c so environment-related state could be managed in one place
[18]. The macro #define USE_THE_REPOSITORY_VARIABLE was also introduced to
help transition code away from implicit global repository access [19].
During GSoC 2025, Ayush Chandekar [20] removed additional usages of
the_repository across the codebase and moved several global configuration
variables (such as core_preload_index and merge_log_config) into
repository-scoped structures. More recently, during Outreachy, Olamide
Bello improved configuration handling by introducing repo_config_values, a
structure linked to struct repository that stores repository-specific
configuration values [21][22]. A supporting private structure,
config_values_private, was added for initialization and internal handling.
Discussions around this work also highlighted an important design
constraint: directly moving globals into repository structures or
introducing lazy loading helpers can cause user experience regressions if
configuration errors are detected later.
These efforts collectively form the foundation of the ongoing work to
gradually remove Git’s reliance on global state and move toward a more
modular, repository-scoped architecture.
Proposed Plan:
-------------
I started exploring the codebase by browsing relevant files and identifying
global variables by temporarily removing the USE_THE_REPOSITORY_VARIABLE
macro. My primary focus was on core library files rather than builtin code
[23]. Through this exploration, I observed that a large number of files still
depend on the_repository.
To tackle this project systematically, I propose classifying these files into
two categories:
1. Files using the_repository or the_hash_algo where a repository
instance already exists: These files rely on global variables even
though a struct repository instance is available somewhere in the
call stack. A simple example is my patch in tree-diff.c, where a
repository instance was already available through struct diff_options
*opt, but the_hash_algo was still used. I replaced it with
opt->repo->hash_algo.
In such cases, the refactor mainly involves passing the repository
instance through the function call stack and replacing the global
usages. If a repository instance is not directly available in the
file, I will trace the callers and propagate it from higher levels in
the call hierarchy.
Examples of such files include alias.c, archive*.c, walker.c, and
xdiff-interface.c. These typically require localized refactoring and
are good candidates for incremental patches.
2. Files relying on other global variables defined in environment.c:
Some files depend on additional global variables that are parsed and
accessed through environment.c. In these cases, there is no existing
repository-scoped instance, making the refactor slightly more involved.
Examples include wt-status.c (default_abbrev, comment_line_str) and
apply.c (has_symlink, ignore_case, trust_executable_bit,
apply_default_whitespace, apply_default_ignorewhitespace).
For such variables, I will evaluate whether they should be moved into
repository-scoped structures (e.g., repo_settings or
repo_config_values), or instead be localized and passed explicitly
where needed. The appropriate approach will depend on how widely the
variable is used and whether it logically belongs in a
multi-repository standpoint.
I plan to begin with the first category, addressing straightforward
refactors file by file. In parallel, I will analyze and work on specific
groups of global variables from the second category, designing
appropriate repository-scoped replacements while preserving the
original parsing timing and availability of those variables.
The end goal is to remove reliance on global state and eventually eliminate
the USE_THE_REPOSITORY_VARIABLE macro from these files.
Project Timeline:
----------------
* Community Bonding (Until May 24):
- Discuss the project direction and design approaches with mentors.
- Identify and prioritize two main areas of work:
+ files that rely on the_repository.
+ global variables defined in environment.c.
- Study the previous patches by Olamide Bello and Ayush Chandekar
in depth, and identify any remaining tasks while discussing
their approaches and challenges with them.
- Interact with all the people involved in this work to better
understand design decisions and potential pitfalls.
- Experiment with small RFC patches, if needed to validate approaches.
* Coding period (May 25 - August 16):
- Send patches for any remaining cleanup or refactoring related to
git_default_config() and repo_config_values [22], as well as
the worktree API [24], if any.
- Identify straightforward refactors to remove usages of the_repository
in files such as xdiff-interface.c, archive*.c, fsmonitor*.c etc.
- Work file by file with the goal of eliminating
#define USE_THE_REPOSITORY_VARIABLE by replacing global usages
with explicit repository instances.
- Concurrently maintain at least two parallel patch series:
+ Small / straightforward refactors and replacements like
the_hash_algo or the_repository.
+ Larger structural refactors involving globals such as
DEFAULT_ABBREV, comment_line_str etc.
- Publish weekly or biweekly blog updates documenting progress and design
decisions.
* Final week (August 17 - August 24):
- Address any remaining tasks or pending patches.
- Receive final feedback from mentors and reviewers.
- Prepare a detailed report summarizing the work completed during the project.
Blogging:
---------
I believe blogging is an important part of any open-source project. It
helps others understand the ongoing work and also enables the contributor
to develop a deeper understanding and keep a better track of their own
progress. I experienced this firsthand, early in my journey I was unsure
about various aspects, but reading the blogs of Ayush and Olamide Bello
gave me valuable insight into the contributor perspective and their overall
work.
With the goal of helping future contributors in a similar way, I plan to
document my journey and project progress through regular blog posts. I will
publish updates on a weekly or biweekly basis, depending on the amount of
meaningful progress made. I have set up my blogging area on Medium, and my
posts will be available at [25].
Availability:
-------------
The main coding period runs from June to August. Most of June and July
coincide with my summer vacation, which allows me to dedicate significant
time to the project. My final exams are scheduled for May and will last
approximately one week, but they will be completed before the coding period
begins and should not affect my availability.
During June and July, I will be able to dedicate around 40 hours per week to
the project. In August, when my regular semester resumes, I expect to
contribute approximately 25–30 hours per week.
I do not have any other exams, internships, or planned vacations during the
coding period. Apart from this project, I have no other major commitments
for the summer.
I will keep the community regularly updated on my progress throughout the
project. My primary mode of communication will be email, and I will also be
available for calls or meetings if/when required. My preferred availability
window is 13:00–19:00 UTC.
Post GSoC:
----------
Being part of the Git community and contributing to the codebase has been a
very valuable experience for me. The process of understanding Git’s internals,
submitting patches, and receiving feedback on the mailing list has helped me
grow significantly as a developer. The feeling of working on code that is used
by millions of developers and companies around the world is very rewarding.
I plan to remain involved with the Git community even after GSoC by continuing
to contribute patches, review code, and participate in discussions to help make
Git better for end users. The work on refactoring Git’s global state is part of
a long-term effort, and I would love to continue working on it beyond the GSoC
timeline.
I would also be happy to mentor, co-mentor, or volunteer in the future to help
new and upcoming contributors whenever I get the chance. I see GSoC as the
starting point of a long-term relationship with the Git community.
Closing & Appreciation:
-----------------------
I would like to thank the Git community for the excellent documentation and the
welcoming environment. I am also grateful for the patience and guidance shown
in the feedback and discussions on the mailing list by Junio, Phillip, Karthik,
Ben, and others, which have helped me improve my understanding and contributions.
I also read blogs and proposals by Ayush, Lucas, Kousik Sanagavarapu, and Olamide
Bello, which provided valuable insights and helped shape my approach to contributing.
Thank you for reading my proposal :)
References:
-----------
[1]- https://github.com/shreyp135/Alethea
[2]- https://unstop.com/college-fests/impulse-2025-maharaja-agrasen-institute-of-technology-mait-new-delhi-348321
[3]- https://cse.mait.ac.in/index.php/academics/9-computer-center/1249-iosd-mait-impulse-25
[4]- https://iosd-web.vercel.app/
[5]- https://www.linkedin.com/posts/code-for-goodtech_augtoberfest-c4gt2024-activity-7242923677032312834-XMul
[6]- https://lore.kernel.org/git/cover.1771511192.git.phillip.wood@dunelm.org.uk/
[7]- https://lore.kernel.org/git/20260304145823.189440-1-shreyanshpaliwalcmsmn@gmail.com/T/#m65b9b4547036991a7b7f3c861b9663428891f588
[8]- https://lore.kernel.org/git/20260114143238.536312-1-shreyanshpaliwalcmsmn@gmail.com/
[9]- https://lore.kernel.org/git/20260115211609.17420-1-shreyanshpaliwalcmsmn@gmail.com/
[10]- https://lore.kernel.org/git/20260204111343.71975-1-shreyanshpaliwalcmsmn@gmail.com/
[11]- https://lore.kernel.org/git/20260205131132.44282-1-shreyanshpaliwalcmsmn@gmail.com/
[12]- https://lore.kernel.org/git/1444778207-859-1-git-send-email-gitster@pobox.com/
[13]- https://lore.kernel.org/git/20160511131745.2914-1-chriscool@tuxfamily.org/
[14]- https://lore.kernel.org/git/20180205235508.216277-1-sbeller@google.com/
[15]- https://lore.kernel.org/git/20170531214417.38857-1-bmwill@google.com/
[16]- https://lore.kernel.org/git/cover.1715339393.git.ps@pks.im/
[17]- https://lore.kernel.org/git/20250206-b4-pks-path-drop-the-repository-v1-16-4e77f0313206@pks.im/
[18]- https://lore.kernel.org/git/20250717-pks-config-wo-the-repository-v1-20-d888e4a17de1@pks.im/
[19]- https://lore.kernel.org/git/cover.1718347699.git.ps@pks.im/
[20]- https://ayu-ch.github.io/2025/08/29/gsoc-final-report.html
[21]- https://cloobtech.hashnode.dev/week-5-and-6-design-reviews-rfcs-and-refining-the-path-forward
[22]- https://lore.kernel.org/all/cover.1771258573.git.belkid98@gmail.com/
[23]- https://lore.kernel.org/git/7b5dd0c4-0ca0-458e-89db-621a70dac9ae@gmail.com/
[24]- https://lore.kernel.org/git/20260217163909.55094-1-shreyanshpaliwalcmsmn@gmail.com/
[25]- https://medium.com/@shreyanshpaliwal18
[26]- https://lore.kernel.org/git/20260319092441.1283001-1-shreyanshpaliwalcmsmn@gmail.com/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [GSoC Proposal v2] Refactoring in order to reduce Git's global state
2026-03-17 17:54 [GSoC Proposal] Refactoring in order to reduce Git's " Francesco Paparatto
@ 2026-03-24 19:31 ` Francesco Paparatto
0 siblings, 0 replies; 15+ messages in thread
From: Francesco Paparatto @ 2026-03-24 19:31 UTC (permalink / raw)
To: git
Cc: Christian Couder, Ayush Chandekar, jltobler, Siddharth Asthana,
karthik nayak
This is my second version of GSoC 2026 Proposal for the project
'Refactoring in order to reduce Git’s global state'.
Doc version: https://docs.google.com/document/d/1xknrv88MnFPidpCbGoK43oAH3rlb_Iiu7Ufx42A3krw/edit?usp=sharing
Changes from v1:
- Added Doc version of the proposal
- Added commit IDs for patches merged to master.
- Added reference to Olamide Bello's latest series [10].
- Added "Remaining Work" section with variable classification
based on codebase analysis of the Olamide Bello latest series,
as suggested by Christian [11].
---
Refactoring in order to reduce Git's global state
Personal Information
--------------------
Name: Francesco Paparatto
Pronouns: he/him
Location: Milan, Italy
Timezone: CET (UTC+1)
Email: francescopaparatto@gmail•com
GitHub: https://github.com/frapaparatto
LinkedIn: https://www.linkedin.com/in/francesco-paparatto/
About Me
--------
I am Francesco Paparatto, a self-taught programmer who dropped out
of a degree in Management to dedicate full-time to software
engineering.
My goal is to work as a Backend/Infrastructure Engineer,
and to reach that goal I am balancing CS fundamentals through
theoretical courses with challenging projects that help me develop
strong engineering skills, not only from a code perspective but also
from a system thinking point of view. I also like building
fundamental things from scratch in order to understand how they work.
This is my first time in open source and I am fascinated by this
world. I wish to become a cornerstone in one open source community.
Git Experience and Contributions
---------------------------------
I started learning Git in depth at the beginning of 2026 when I
began working on my cgit project [1], a small reimplementation of
Git's core plumbing commands in order to understand how they really
work under the hood, but also as a way to start reading and learning
from real codebases and learn how to design and structure code
properly.
So far, I have made the following contributions:
* [GSoC PATCH v2] t3310: replace test -f/-d with
test_path_is_file/test_path_is_dir
Status: Graduated to 'master'.
Link: https://lore.kernel.org/git/20260228005939.9012-1-francescopaparatto@gmail.com/
Commit: f31b322008c526693660770e66c12f4bcfd29558
* [PATCH v4] t3310: avoid hiding failures from rev-parse in
command substitutions
Status: Graduated to 'master'.
Link: https://lore.kernel.org/git/20260307103631.89829-1-francescopaparatto@gmail.com/
Commit: d3edca979a1e916518bc2376e468609ddae2a217
Overview
--------
Git's internal functions rely heavily on global state stored in
environment.c. Configuration values like trust_executable_bit,
editor_program, and git_commit_encoding are declared as file-scope
globals and populated at startup through git_default_config() and
its sub-handlers like git_default_core_config().
This design assumes a single repository per process. When Git is
used as a library (libification) or needs to handle multiple
repositories in the same process, globals from one repository
overwrite values from another. For example, two threads formatting
commits for repositories with different i18n.commitEncoding settings
would race on the same git_commit_encoding pointer.
The goal of this project is to move these global variables into
per-repository structures within struct repository, following the
pattern established by Olamide Bello's Outreachy work with struct
repo_config_values [2].
Context and Prior Work
-----------------------
Not all config variables can be treated in the same way. There is
a fundamental distinction between eagerly and lazily parsed
variables, and conflating the two causes regressions.
Variables set in git_default_core_config() are eagerly parsed. They
are read at startup, and if a value is invalid, Git calls die()
immediately with a clear error before doing any real work. The user
gets early feedback and can fix their config.
Variables in struct repo_settings are lazily parsed. They are
populated on first access via prepare_repo_settings(). If an eagerly
parsed variable is naively moved into this struct, invalid config
that used to crash at startup now crashes mid-operation.
During GSoC 2025, Ayush Chandekar moved several global configuration
variables into repository-scoped structures [3]. Through this work
and subsequent review discussions, the eager/lazy problem became
visible [4].
Ayush's work also surfaced the getter/setter debate. When he
introduced getter and setter functions for repo_settings fields,
reviewers pointed out they added no value without calling
prepare_repo_settings() internally. From this discussion, Junio
suggested two approaches for repo_settings variables that must
not be mixed [5]:
- Common variables: populated in prepare_repo_settings(), accessed
directly via repo->settings.foo. No getter, no setter.
- Rare variables: prepare_repo_settings() does not touch the field.
A lazy getter checks a sentinel value (e.g. -1), reads from
config on first access, and caches the result.
The appropriate pattern for each variable will require reasoning
and discussion on the mailing list.
Phillip Wood suggested a third approach: passing a
repository pointer through git_default_config() via the void *cb
callback data parameter, so handlers can populate per-repo structs
without touching globals [6].
Building on these lessons, Olamide Bello during the Outreachy
program introduced struct repo_config_values [2], a structure
linked to struct repository that stores eagerly parsed configuration
values while preserving their startup-time error detection. An
accessor function repo_config_values() enforces safety by preventing
access from uninitialized repositories and guarding against access
from secondary repository instances that do not yet have their
config populated.
So we now have two structs living inside struct repository:
repo_settings for lazily parsed variables, and repo_config_values
for eagerly parsed variables.
Approach
--------
I will follow the pattern established in Olamide Bello's approved
patch series [2], which provides the concrete workflow for each
variable:
1. Add a new field to struct repo_config_values in environment.h.
2. Initialize the field in repo_config_values_init().
3. Update the config callback: get cfg via
repo_config_values(the_repository), write to cfg->field instead
of the global.
4. Update all call sites: replace the global with cfg->field.
5. Remove the global from environment.c and the extern from
environment.h.
6. Run tests and check fuzz targets.
Additionally, when a variable is also written by CLI options (e.g.,
OPT_INTEGER or OPT_BOOL in builtin/*.c), those option definitions
must also be updated to point to cfg->field. If only the config
path is updated and the CLI path is missed, CLI values silently
stop working. This was caught during review of Bello's
pack_compression_level patch [10].
This workflow is not purely mechanical. Each variable requires
case-by-case analysis:
- Is the variable per-repository? Some variables like
editor_program are user preferences. As Phillip Wood asked [7],
variables where per-repo scoping does not make semantic sense
may be better handled by localizing them to their subsystem.
- How deep is the call chain? As preparation for this proposal, I
traced askpass_program end-to-end. It has a single reader in
prompt.c, which looks simple. But git_prompt() is called from
two paths: the credential system and the bisect system. The
difficulty of a variable is not about reader count, it is
about call chain depth.
- Are there initialization ordering constraints? Some variables
like is_bare_repository_cfg are set during .git directory
discovery, before struct repository is fully initialized.
Moving them into the repository struct creates a chicken-and-egg
problem that requires design discussion on the mailing list.
- Are there dependent variables? Some variables must be migrated
together. For example, comment_line_str_to_free and
auto_comment_line_char are set in the same config callback and
read together in builtin/commit.c. Migrating one without the
other would leave half the state global and half per-repo.
- Does the variable have CLI interaction? Variables written by
command-line options via OPT_INTEGER, OPT_BOOL, etc. need both
the config path and the CLI path updated.
The macro #define USE_THE_REPOSITORY_VARIABLE, introduced by
Patrick Steinhardt [8], controls access to the_repository
global. The macro serves both as a migration indicator and a
technical gate. When all globals in a file have been migrated
and all functions receive struct repository * explicitly,
the macro can be removed.
Following Stolee's two-step migration model [9], I will first
move variables into repo_config_values using the_repository
(Step 1: safe, mechanical, no behavior change). For selected
variables with shallow call chains, I will also thread struct
repository *repo through callers to begin replacing direct
the_repository usage (Step 2).
I propose a dual approach for organizing the work:
- Variable-focused migration: move environment.c globals into
repo_config_values following Bello's pattern. This is the
primary track. For each variable, I classify it, trace readers,
migrate it, and remove the global.
- File-focused cleanup: for files where only a few the_repository
usages remain after variable migration, complete the cleanup
and remove USE_THE_REPOSITORY_VARIABLE entirely. This is a
natural side effect of the first track.
Some variables may need a hybrid approach: when a variable is
used across many files but heavily concentrated in one subsystem,
it may make sense to migrate it alongside other globals in that
subsystem rather than in isolation.
The two tracks reinforce each other: migrating a variable often
removes the last reason a file needs the macro.
Remaining Work and Variable Classification
--------------------------------------------
Olamide Bello's merged series [2] migrated: git_attributes_file,
core_apply_sparse_checkout, and git_branch_track.
His latest series [10] addresses: trust_ctime, check_stat,
zlib_compression_level, pack_compression_level, precomposed_unicode,
core_sparse_checkout_cone, sparse_expect_files_outside_of_patterns,
and warn_on_object_refname_ambiguity.
After those series, approximately 20+ variables remain in
environment.c. I analyzed them and classified a representative
set below, grouped by difficulty and type of challenge they
present.
Straightforward per-repo booleans (few readers, no CLI
interaction, clearly filesystem-dependent):
* trust_executable_bit (core.filemode)
Eagerly parsed in git_default_core_config() at
environment.c:307. Determines whether the filesystem
correctly represents executable bits. Per-repo because
different repos may live on different filesystems (e.g.,
FAT32 does not support executable bits, ext4 does). Git
probes this during init/clone.
Reader files: apply.c, read-cache.c, read-cache.h (3 files).
No CLI interaction.
Note: used together with has_symlinks in read-cache.c:744,
migrating both in the same series would be clean.
* has_symlinks (core.symlinks)
Eagerly parsed in git_default_core_config(). Determines
whether the filesystem supports symbolic links. Same
rationale as trust_executable_bit: filesystem-dependent,
clearly per-repo.
Reader files: apply.c, combine-diff.c, compat/mingw.c,
entry.c, read-cache.c, read-cache.h (6 files).
No CLI interaction with the global. Note: builtin/difftool.c
has its own local has_symlinks field inside struct
difftool_options. This is a separate variable with the same
name, not the global.
Ambiguous per-repo semantics (require mailing list discussion):
* editor_program (core.editor)
Eagerly parsed in git_default_core_config() at
environment.c:438. Sets the default editor. Phillip Wood
questioned whether per-repo scoping makes sense [7], since
it is a user preference rather than a repository property.
Reader files: editor.c (1 file). Very shallow call chain
but the design question must be resolved first.
No CLI interaction. No dependencies.
Dependent variables (must be migrated together):
* comment_line_str_to_free and auto_comment_line_char
(core.commentchar, core.commentstring)
Both eagerly parsed in the same config callback in
git_default_core_config(). auto_comment_line_char is a
boolean flag controlling whether Git auto-selects a comment
character that does not conflict with the commit message.
comment_line_str_to_free stores the actual string used.
They are set together and read together in
builtin/commit.c. Migrating one without the other would
leave half the state global and half per-repo.
Reader files: builtin/commit.c (1 file for both).
No CLI interaction.
High reader count (significant effort):
* ignore_case (core.ignorecase)
Eagerly parsed in git_default_core_config(). Enables Git
to work on case-insensitive filesystems. Clearly per-repo
(filesystem-dependent, probed during init/clone).
Reader files: apply.c, dir.c, fsmonitor.c, name-hash.c,
read-cache.c, refs/files-backend.c, submodule.c, ... (15+ files)
Note: many builtin/ files (grep.c, branch.c, tag.c,
for-each-ref.c) have their own ignore_case fields in local
structs. These are separate from the global. Careful
analysis is needed to distinguish global usage from local
usage.
Other remaining variables that will be classified during the
community bonding period:
minimum_abbrev, default_abbrev, assume_unchanged,
git_commit_encoding, git_log_output_encoding,
apply_default_whitespace, apply_default_ignorewhitespace,
fsync_object_files, use_fsync, fsync_method,
fsync_components, askpass_program, excludes_file,
auto_crlf, core_eol, global_conv_flags_eol,
check_roundtrip_encoding, autorebase, push_default,
object_creation_mode, grafts_keep_true_parents,
pack_size_limit_cfg, protect_hfs, protect_ntfs,
git_work_tree_cfg.
Timeline
--------
Project size: 175 hours.
Community Bonding (May 1 - May 25):
- Discuss project direction and design approaches with mentors.
- Study Bello Caleb's and Ayush Chandekar's patches in depth.
Review remaining repo_config_values work and identify
unfinished tasks.
- Complete classification of remaining variables listed above.
- Start discussions for ambiguous cases on the mailing list.
- Submit an RFC patch following Bello's pattern to validate
the workflow before the coding period begins.
Coding Period (May 26 - August 16):
- Start with straightforward variables: filesystem-dependent
booleans like trust_executable_bit and has_symlinks. These
have few readers, clear per-repo semantics, and no complex
parsing.
- Progressively move to more involved variables: string-type
values like excludes_file, dependent pairs like
comment_line_str_to_free and auto_comment_line_char, and
high-reader-count variables like ignore_case.
- Apply the dual approach described above:
+ Variable-focused migration: classify, trace, migrate, and
remove globals following Bello's pattern.
+ File-focused cleanup: where variable migration removes the
last global dependency in a file, complete the cleanup and
remove USE_THE_REPOSITORY_VARIABLE.
- Submit small patch series (3-5 patches each) frequently to
respect reviewers' time and maintain steady velocity.
- Maintain two parallel series: one in review and one being
written, to account for review cycle delays.
- Continuously iterate: incorporate mailing list feedback,
reroll patches (v2/v3), and refine the approach based on
community input.
- Publish weekly blog updates documenting progress and design
decisions.
Final period (August 17 - August 24):
- Address any remaining tasks or pending patches.
- Update internal documentation.
- Receive final feedback from mentors and reviewers.
- Prepare and submit the final project report.
A 30% buffer is built into the schedule to account for
unexpected review delays and design discussions.
Blogging
--------
I believe blogging is an important part of growing as a developer
and an effective way to learn, because writing forces you to
truly understand what you are working on.
I plan to publish weekly updates documenting my journey through this
project: progress, design decisions, challenges, and lessons
learned. I also want these posts to serve as a valuable resource
for anyone who, like me today, will look for guidance on
contributing to Git or to open source projects in general.
Availability
------------
Git will be my top priority. I have no other commitments
scheduled during the GSoC period, so I will be able to work on
this full-time. In fact, I plan to devote 35–40+ hours per week
to the Git project. My preferred working window is 9:00-18:00 CET.
Post-GSoC
---------
Contributing to Git has been an invaluable experience.
Not only on a personal level because it pushed me out of my
comfort zone and challenged me but also, and above all, on a
professional level. The feeling of working on code used by millions
of developers and companies around the world is incredibly rewarding.
This iterative process of discussions, writing code, and receiving
feedback helps you grow tremendously as a developer and
especially quickly.
Being exposed to a codebase like Git’s forces you to think much more
deeply, to understand how everything works and how it connects
to the rest of the program. For these reasons, I intend to continue
working on Git even after GSoC by contributing patches, participating
in discussions, and reviewing new members’ code.
Furthermore, this refactoring process is a long-term effort,
and I’d like to keep working on it.
References
----------
[1] https://github.com/frapaparatto/cgit
[2] https://lore.kernel.org/git/cover.1768217572.git.belkid98@gmail.com/
[3] https://lore.kernel.org/git/20250603131806.14915-1-ayu.chandekar@gmail.com/
[4] https://lore.kernel.org/git/17b7f51c-0c3d-4d63-a501-47ce829f7345@gmail.com/
[5] https://lore.kernel.org/git/xmqqbjquge0c.fsf@gitster.g/
[6] https://lore.kernel.org/git/d61c966b-61ae-4ba9-b983-c8dab6e2c292@gmail.com/
[7] https://lore.kernel.org/git/8e657184-ee0b-453a-9f2d-a98080d3582e@gmail.com/
[8] https://lore.kernel.org/git/cover.1718347699.git.ps@pks.im/
[9] https://lore.kernel.org/git/47d09c43-6d27-40ff-8dbc-22cc4a5949ed@gmail.com/
[10] https://lore.kernel.org/git/cover.1773127785.git.belkid98@gmail.com/
[11] https://lore.kernel.org/git/CAP8UFD1H8ZsxfGSnnvX9xkKLSSpDjA3e3KNZ7eHN3ruq-sC7fw@mail.gmail.com/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-03-24 19:31 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 14:57 [GSOC][PROPOSAL]: Refactoring in order to reduce Git’s global state Shreyansh Paliwal
2026-03-07 10:33 ` Christian Couder
2026-03-07 12:46 ` Shreyansh Paliwal
2026-03-07 20:04 ` [GSOC][PROPOSAL v2]: " Shreyansh Paliwal
2026-03-09 14:42 ` Christian Couder
2026-03-10 14:58 ` Shreyansh Paliwal
2026-03-20 18:12 ` [GSOC][PROPOSAL v3]: " Shreyansh Paliwal
-- strict thread matches above, loose matches on Subject: below --
2026-03-17 17:54 [GSoC Proposal] Refactoring in order to reduce Git's " Francesco Paparatto
2026-03-24 19:31 ` [GSoC Proposal v2] " Francesco Paparatto
2025-04-02 18:14 [GSoC PROPOSAL v1] Refactoring in order to reduce Git’s " Arnav Bhate
2025-04-05 18:41 ` [GSoC PROPOSAL v2] " Arnav Bhate
2025-03-26 5:26 [GSOC] [PROPOSAL V1]: " Ayush Chandekar
2025-04-04 8:51 ` [GSOC] [PROPOSAL v2]: " Ayush Chandekar
2025-04-04 14:45 ` Karthik Nayak
2025-04-06 10:44 ` Ayush Chandekar
2025-04-07 9:06 ` Christian Couder
2025-04-07 10:07 ` Ayush Chandekar
2025-04-07 8:42 ` Ayush Chandekar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox