public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
* [Outreachy][Proposal]: Refactor in order to reduce Git’s global state
@ 2025-10-29  1:18 Bello Olamide
  2025-10-29 15:51 ` Christian Couder
  2025-10-30 14:49 ` =?y?q?=5BOutreachy=5D=5BProposal=20v2=5D=3A=20Refactor=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Olamide Caleb Bello
  0 siblings, 2 replies; 7+ messages in thread
From: Bello Olamide @ 2025-10-29  1:18 UTC (permalink / raw)
  To: git, Usman Akinyemi, Christain Couder

Hello,
This is my proposal for the project
"Refactor in order to reduce Git’s global state" for the 2025 Outreachy
Internship program.

Personal Bio:
===========
Full Name: Bello Caleb Olamide
Email: belkid98@gmail•com
Personal Blog: https://cloobtech.hashnode.dev/
GitHub: https://github.com/cloobtech

About Me:
=========
I'm Bello Olamide. I am passionate about software engineering and
I love to figure out things. I like participating in tech
events such as hackathons but this will be my first open source experience
and I have relished the opportunity and experience so far.
I love being part of a community that strive to achieve a goal and one that
I found myself is a small albeit growing community that helps to guide and
mentor younger boys find their way into the tech ecosystem. I have developed
my coding skill via various sources including personal learning, freelancing,
collaboration with other developers and from the ALX Software Engineering
program.

Past Experience with Git:
===================
I have been a Git user for sometime now majorly for collaborating with other
developers, tracking version changes to files and during this contribution
stage, I have understood the ropes of how to send patches to Git.

Contributions to the Git Community:
==========================
I have been able to send some patches to the Git codebase with the guidance
and direction of community members.

Microproject:
==========
Link: https://lore.kernel.org/git/cover.1761217100.git.belkid98@gmail.com/
Branch: ob/gpg-interface-cleanup
Status: Merged to next
Commit ID: ce6d041635
Description: strbuf_split*() to split a string into multiple strbufs
is often a wrong API to use.
A few uses of it have been removed by simplifying the code.

Project Overview
=============
Git uses a single global `struct repository` object called `the_repository`
which internal functions rely on to store, access and modify environment
and configuration variables.
With this approach, multi-repository instances running in the same process
can lead to inconsistent behaviours and race conditions.
By refactoring the code to stop storing repository-scoped
configurations in global variables in
`environment.c file`, that is by moving the appropriate global
variables into localised state
within the `struct repository` and `struct repo-settings`, the
codebase becomes more maintainable,
easier to test and future work such as libifying Git becomes feasible.

Internship Objectives and Plans
========================
The project aims to identify repository scoped global variables in
`environment.c`
and related files that can be moved to local scope within `struct
repository` and
`struct repo-settings`, find an appropriate strategy to move them to
local scope and implement the changes. This architectural improvement
will make the
codebase more maintainable and enable better multi-repository handling
in the future.

From a high level overview, environment.[ch] exposes some global
variables that reflect a per-repository state and examples of such include
git_work_tree_cfg, is_bare_repository_cfg, and core.* settings and functions
which also depend on `the_repository` such as have_git_dir(),
is_bare_repository().
After a brief study of some related work done on the project,
it is important to understand the purpose of the identified global variable
and how it is used across the code base, observing how it relates with other
subsystems and moving it to the `struct repository` or `struct
repo-settings` if its
use is repository specific, or specify an appropriate context based on its scope
and use this context in the accessor functions.
For example in [1], Patrick Steinhardt observes that `core.hooksPath`
is repository specific and is stored in the global variable `git_hooks_path`.
The variable is then moved into local scope in the repo-settings
struct and a new
accessor function `repo_settings_get_hooks_path()` is written and used to
set the `hooks_path` of the repo specific struct which the path subsystem
reads from.
Similarly in [2], `core.sharedRepository` is tracked via the global variables
`the_shared_repository ` and `need_shared_repository`. These are then
moved into the repo-settings struct, with new accessors functions
written to modify them,
and calls to the accessors in the path subsystem are then modified to
replace the old
accessors which modify the global variables.

I also studied [3], [4] by Ayush Chandeker,] and [5] by John Cai to broaden my
understanding of the project.

Proposed Project Execution Timeline
=========================

1. Study Code Base To Identify Suitable Candidates (Now - December 8, 2025):
------------------------------------------------------------------------
- The first step will be familiarising myself with the code base to
   understand how these global variables in environment.c are initialised,
   used and how they interact with other subsystems.

2. Community Feedback Bonding ( December 9 - December 15, 2025):
------------------------------------------------------------
- Discuss environment variables with mentors and community members
- Understand best refactoring approach based on feedback from mentors

3. Review Existing Patch and Define Criteria (December 16 - January 9, 2026):
-------------------------------------------------------------
- Thoroughly examine the existing patch series submitted to the mailing
    list  to understand;
    * What criteria makes a global variable a suitable candidate to be
       moved to the `struct repository` or `struct repo-settings`
    * What appropriate context it should be moved into based on its
       interactions with other subsystems.
    * If remaining a global variable is the best approach in its case.
- This information can be gotten by paying attention to the discussions
in the patches and also engaging with my mentors and the Git community.

4. Implement Candidates and Submit PRs ( January 10 - February 28, 2026):
--------------------------------------------------------------------------
- With collaboration from mentors and the Git community, identify
suitable candidates for relocation.
- Relocate them into `struct repository`, `struct repo-settings` and
other appropriate
contexts.
- Pass the repository parameter to accessor functions to replace the
global dependence
- Write new accessor functions if necessary
- Modify accessor callers to reflect the new changes while ensuring
all affected code paths works
  correctly
- Update tests and documentations
- Recursively submit patches for reviews, engaging in discussions and
implement suggestions

5. Final Report on Project (February 29 - March 6)
--------------------------------
- Document final report in my blog with details on my experience
- Finalize any pending tasks or reviews on any submitted patch

Availability
========
I am currently not enrolled in any school or jobs, so I will be able to give
30 hours a week or more to make the project a success.

Blogging
=======
I have set up my blog where i will document my progress, insights,
challenges and experience weekly.

Post Outreachy
====================
The welcoming and patient atmosphere during this short contribution
period with the Git
community has made me want to keep getting involved with the
community. I am committed to
continuously contributing to Git and become a part of of the next set
of contributors
to champion the continuous development of Git.

Appreciation
==========
To Junio and Christian, I really appreciate your guidance, patience
and direction while
reviewing and helping with my patches and to Usman for your inputs and to every
member of the Git community, I thank you all.

References
=========
[1]: https://public-inbox.org/git/20250207-b4-pks-path-drop-the-repository-v2-14-13cad3c11b8a@pks.im/#Z31config.c
[2]: https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-15-4e77f0313206@pks.im/
[3]: https://lore.kernel.org/git/d0e2042b3061320fac8a8fdf9043c6ab4dbed5a2.1752882401.git.ayu.chandekar@gmail.com/
[4]: https://lore.kernel.org/git/c82620a1f54ea6760bff204fd2b5fe5c2df1896c.1753804956.git.ayu.chandekar@gmail.com/
[5]: https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Outreachy][Proposal]: Refactor in order to reduce Git’s global state
  2025-10-29  1:18 [Outreachy][Proposal]: Refactor in order to reduce Git’s global state Bello Olamide
@ 2025-10-29 15:51 ` Christian Couder
  2025-10-30 10:59   ` Bello Olamide
  2025-10-30 14:49 ` =?y?q?=5BOutreachy=5D=5BProposal=20v2=5D=3A=20Refactor=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Olamide Caleb Bello
  1 sibling, 1 reply; 7+ messages in thread
From: Christian Couder @ 2025-10-29 15:51 UTC (permalink / raw)
  To: Bello Olamide; +Cc: git, Usman Akinyemi

Hi,

On Wed, Oct 29, 2025 at 2:18 AM Bello Olamide <belkid98@gmail•com> wrote:
>
> Hello,
> This is my proposal for the project
> "Refactor in order to reduce Git’s global state" for the 2025 Outreachy
> Internship program.

Thanks for this proposal.

[...]

> From a high level overview, environment.[ch] exposes some global
> variables that reflect a per-repository state and examples of such include
> git_work_tree_cfg, is_bare_repository_cfg, and core.* settings and functions
> which also depend on `the_repository` such as have_git_dir(),
> is_bare_repository().
> After a brief study of some related work done on the project,
> it is important to understand the purpose of the identified global variable
> and how it is used across the code base, observing how it relates with other
> subsystems and moving it to the `struct repository` or `struct
> repo-settings` if its
> use is repository specific, or specify an appropriate context based on its scope
> and use this context in the accessor functions.
> For example in [1], Patrick Steinhardt observes that `core.hooksPath`
> is repository specific and is stored in the global variable `git_hooks_path`.
> The variable is then moved into local scope in the repo-settings
> struct and a new
> accessor function `repo_settings_get_hooks_path()` is written and used to
> set the `hooks_path` of the repo specific struct which the path subsystem
> reads from.
> Similarly in [2], `core.sharedRepository` is tracked via the global variables
> `the_shared_repository ` and `need_shared_repository`. These are then
> moved into the repo-settings struct, with new accessors functions
> written to modify them,
> and calls to the accessors in the path subsystem are then modified to
> replace the old
> accessors which modify the global variables.

Nit: the above paragraph looks very big. Maybe it could be split a bit.

> I also studied [3], [4] by Ayush Chandeker,] and [5] by John Cai to broaden my
> understanding of the project.

Are there some cases where strategies other than writing new accessors
functions were used?

Are there pieces of work on this that were started but not finished?
Are you planning to finish them?

What are the roadblocks that were faced when working on this?

> 3. Review Existing Patch and Define Criteria (December 16 - January 9, 2026):
> -------------------------------------------------------------
> - Thoroughly examine the existing patch series submitted to the mailing
>     list  to understand;
>     * What criteria makes a global variable a suitable candidate to be
>        moved to the `struct repository` or `struct repo-settings`
>     * What appropriate context it should be moved into based on its
>        interactions with other subsystems.
>     * If remaining a global variable is the best approach in its case.
> - This information can be gotten by paying attention to the discussions
> in the patches and also engaging with my mentors and the Git community.

Are you sure that it will be possible to define clear criteria?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Outreachy][Proposal]: Refactor in order to reduce Git’s global state
  2025-10-29 15:51 ` Christian Couder
@ 2025-10-30 10:59   ` Bello Olamide
  2025-10-30 12:55     ` Christian Couder
  0 siblings, 1 reply; 7+ messages in thread
From: Bello Olamide @ 2025-10-30 10:59 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Usman Akinyemi

On Wed, 29 Oct 2025 at 16:51, Christian Couder
<christian.couder@gmail•com> wrote:
>
> Hi,
>
> On Wed, Oct 29, 2025 at 2:18 AM Bello Olamide <belkid98@gmail•com> wrote:
> >
> > Hello,
> > This is my proposal for the project
> > "Refactor in order to reduce Git’s global state" for the 2025 Outreachy
> > Internship program.
>
> Thanks for this proposal.
>
> [...]
>
> > From a high level overview, environment.[ch] exposes some global
> > variables that reflect a per-repository state and examples of such include
> > git_work_tree_cfg, is_bare_repository_cfg, and core.* settings and functions
> > which also depend on `the_repository` such as have_git_dir(),
> > is_bare_repository().
> > After a brief study of some related work done on the project,
> > it is important to understand the purpose of the identified global variable
> > and how it is used across the code base, observing how it relates with other
> > subsystems and moving it to the `struct repository` or `struct
> > repo-settings` if its
> > use is repository specific, or specify an appropriate context based on its scope
> > and use this context in the accessor functions.
> > For example in [1], Patrick Steinhardt observes that `core.hooksPath`
> > is repository specific and is stored in the global variable `git_hooks_path`.
> > The variable is then moved into local scope in the repo-settings
> > struct and a new
> > accessor function `repo_settings_get_hooks_path()` is written and used to
> > set the `hooks_path` of the repo specific struct which the path subsystem
> > reads from.
> > Similarly in [2], `core.sharedRepository` is tracked via the global variables
> > `the_shared_repository ` and `need_shared_repository`. These are then
> > moved into the repo-settings struct, with new accessors functions
> > written to modify them,
> > and calls to the accessors in the path subsystem are then modified to
> > replace the old
> > accessors which modify the global variables.
>
> Nit: the above paragraph looks very big. Maybe it could be split a bit.

Okay I will do that, thank you
>
> > I also studied [3], [4] by Ayush Chandeker,] and [5] by John Cai to broaden my
> > understanding of the project.
>
> Are there some cases where strategies other than writing new accessors
> functions were used?

Yes there were cases where the functions were adapted to use
exactly what it needs down the call chain rather than writing new
accessor functions.
An example is
https://public-inbox.org/git/20250306-b4-pks-objects-without-the-repository-v2-1-f3465327be69@pks.im/#Z31csum-file.h
where the global variable `the_hash_algo` is replaced with an explicit parameter
`const struct git_hash_algo *algo` in low-level functions such as
`static struct hashfile *hashfd_internal()` and the call sites adapted
to use r->hash_algo
or the_repository->hash_algo in places where the subsystem has not gotten rid of
`the-repository`.

This is also a strategy that can be used to replace global variables.
>
> Are there pieces of work on this that were started but not finished?
> Are you planning to finish them?
>
> What are the roadblocks that were faced when working on this?
>

Yes. There were pieces of work that were started but not finished which I plan
to finish.
As an example, the patch
https://lore.kernel.org/git/20250309153321.254844-1-ayu.chandekar@gmail.com/
attempts to move the `git_attributes_file` global variable to the
`struct repository`.
However because the global variable is used by the attributes subsystem and
a single repository can have more than one set of attributes, that is
the work-tree attributes
and the index attributes, placing the variable into a repository
instance and passing it
around in the call chain will not be appropriate. Also most of the
functions in the attributes
subsystem pass the `index_state` as a parameter and not the repository.
This is because an index knows its repository but a repository only knows its
primary index. Therefore each repository for an index will need to be known
from the index.

As Junio pointed out in the discussion on the thread:
"As the attribute system is all about giving extra information on the
paths that appear in the index and in the working tree, it may make
sense for the API to go from the index state which is about the
index and the working tree to access the attributes, rather than
from the repository structure, which controls a lot wider concept
and moving anything and everything there will easily and quickly
make it a messy kitchen sink."

So Given that the `index_state` struct has a repo member, we can move
'git_attributes_file' into the repo struct but access it through the
`index_state`.
By doing that we know the index truly owns the attributes.

There is also `is_bare_repository_cfg` as seen in
https://lore.kernel.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/
I have only skimmed through the discussions and patches to understand why it
was not finished.
But I will do an in depth study to understand why it was not completed and what
it takes to finish it.

> > 3. Review Existing Patch and Define Criteria (December 16 - January 9, 2026):
> > -------------------------------------------------------------
> > - Thoroughly examine the existing patch series submitted to the mailing
> >     list  to understand;
> >     * What criteria makes a global variable a suitable candidate to be
> >        moved to the `struct repository` or `struct repo-settings`
> >     * What appropriate context it should be moved into based on its
> >        interactions with other subsystems.
> >     * If remaining a global variable is the best approach in its case.
> > - This information can be gotten by paying attention to the discussions
> > in the patches and also engaging with my mentors and the Git community.
>
> Are you sure that it will be possible to define clear criteria?

Yes it will be possible to define clear criteria per global variable.
For example, from my brief study of previous work, if the variable value is:

1. meant to be different for different repositories, it is a candidate
to move, if not then it is left
    as is, like the case of `local_repo_env[]`.

2. used during early startup, it cannot be moved blindly but will need
a closer inspection
    and refactoring of the startup code as is the case with
`have_git_dir()` noted by Patrick and
    Shejialuo in
    https://lore.kernel.org/git/20250305104650.238392-1-ayu.chandekar@gmail.com/.

Its relationship with other subsystems is also a criteria to define
such as the case of
`git_attributes_file mentioned` above

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Outreachy][Proposal]: Refactor in order to reduce Git’s global state
  2025-10-30 10:59   ` Bello Olamide
@ 2025-10-30 12:55     ` Christian Couder
  2025-10-31 11:43       ` Bello Olamide
  0 siblings, 1 reply; 7+ messages in thread
From: Christian Couder @ 2025-10-30 12:55 UTC (permalink / raw)
  To: Bello Olamide; +Cc: git, Usman Akinyemi

On Thu, Oct 30, 2025 at 11:59 AM Bello Olamide <belkid98@gmail•com> wrote:
> On Wed, 29 Oct 2025 at 16:51, Christian Couder
> <christian.couder@gmail•com> wrote:
> > On Wed, Oct 29, 2025 at 2:18 AM Bello Olamide <belkid98@gmail•com> wrote:

> > > I also studied [3], [4] by Ayush Chandeker,] and [5] by John Cai to broaden my
> > > understanding of the project.
> >
> > Are there some cases where strategies other than writing new accessors
> > functions were used?
>
> Yes there were cases where the functions were adapted to use
> exactly what it needs down the call chain rather than writing new
> accessor functions.
> An example is
> https://public-inbox.org/git/20250306-b4-pks-objects-without-the-repository-v2-1-f3465327be69@pks.im/#Z31csum-file.h
> where the global variable `the_hash_algo` is replaced with an explicit parameter
> `const struct git_hash_algo *algo` in low-level functions such as
> `static struct hashfile *hashfd_internal()` and the call sites adapted
> to use r->hash_algo
> or the_repository->hash_algo in places where the subsystem has not gotten rid of
> `the-repository`.
>
> This is also a strategy that can be used to replace global variables.

Your answers are appreciated, but, just to be clear, I think it would
be nice if the answers to my questions like this one were part of a v2
of your proposal. If I don't see a v2, I am less tempted to discuss
this further (which could hopefully help move the analysis forward and
make your proposal better).

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* =?y?q?=5BOutreachy=5D=5BProposal=20v2=5D=3A=20Refactor=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?=
  2025-10-29  1:18 [Outreachy][Proposal]: Refactor in order to reduce Git’s global state Bello Olamide
  2025-10-29 15:51 ` Christian Couder
@ 2025-10-30 14:49 ` Olamide Caleb Bello
  2025-11-01 19:13   ` [Outreachy][Proposal v2]: Refactor in order to reduce Git’s global state Bello Olamide
  1 sibling, 1 reply; 7+ messages in thread
From: Olamide Caleb Bello @ 2025-10-30 14:49 UTC (permalink / raw)
  To: git; +Cc: christian.couder

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 11783 bytes --]

Hello,
This is the second iteration on my proposal for the project
"Refactor in order to reduce Git’s global state" for the 2025 Outreachy
Internship program.

The changes from v1 includes answers to questions from Christian on other
refactoring strategies used asides writing new accessors, unfinished previous
works and the roadblocks encountered.

Personal Bio:
===========
Full Name: Bello Caleb Olamide
Email: belkid98@gmail•com
Personal Blog: https://cloobtech.hashnode.dev/
GitHub: https://github.com/cloobtech

About Me:
=========
I'm Bello Olamide. I am passionate about software engineering and
I love to figure out things. I like participating in tech
events such as hackathons but this will be my first open source experience
and I have relished the opportunity and experience so far.
I love being part of a community that strive to achieve a goal and one that
I found myself is a small albeit growing community that helps to guide and
mentor younger boys find their way into the tech ecosystem. I have developed
my coding skill via various sources including personal learning, freelancing,
collaboration with other developers and from the ALX Software Engineering
program.

Past Experience with Git:
===================
I have been a Git user for sometime now majorly for collaborating with other
developers, tracking version changes to files and during this contribution
stage, I have understood the ropes of how to send patches to Git.

Contributions to the Git Community:
==========================
I have been able to send some patches to the Git codebase with the guidance
and direction of community members.

Microproject:
==========
Link: https://lore.kernel.org/git/cover.1761217100.git.belkid98@gmail.com/
Branch: ob/gpg-interface-cleanup
Status: Merged to next
Commit ID: ce6d041635
Description: strbuf_split*() to split a string into multiple strbufs
is often a wrong API to use.
A few uses of it have been removed by simplifying the code.

Project Overview
=============
Git uses a single global `struct repository` object called `the_repository`
which internal functions rely on to store, access and modify environment
and configuration variables.
With this approach, multi-repository instances running in the same process
can lead to inconsistent behaviours and race conditions.
By refactoring the code to stop storing repository-scoped
configurations in global variables in
`environment.c file`, that is by moving the appropriate global
variables into localised state
within the `struct repository` and `struct repo-settings`, the
codebase becomes more maintainable,
easier to test and future work such as libifying Git becomes feasible.

Internship Objectives and Plans
========================
The project aims to identify repository scoped global variables in
`environment.c` and related files that can be moved to local scope within
`structrepository` and `struct repo-settings`, find an appropriate strategy
to move them to local scope and implement the changes. This architectural
improvement will make the codebase more maintainable and enable better
multi-repository handling in the future.

From a high level overview, environment.[ch] exposes some global
variables that reflect a per-repository state and examples of such include
git_work_tree_cfg, is_bare_repository_cfg, and core.* settings and functions
which also depend on `the_repository` such as have_git_dir(),
is_bare_repository().

Review of Previous Work and Refactor Stategies:
===============================================
After a brief study of some related work done on the project,
it is important to understand the purpose of the identified global variable
and how it is used across the code base, observing how it relates with other
subsystems and moving it to the `struct repository` or `struct
repo-settings` if its use is repository specific, or specify an appropriate
context based on its scopeand use this context in the accessor functions.
For example in [1], Patrick Steinhardt observes that `core.hooksPath`
is repository specific and is stored in the global variable `git_hooks_path`.
The variable is then moved into local scope in the repo-settings
struct and a new accessor function `repo_settings_get_hooks_path()` is written
and used to set the `hooks_path` of the repo specific struct which the path
subsystem reads from.

Similarly in [2], `core.sharedRepository` is tracked via the global variables
`the_shared_repository ` and `need_shared_repository`. These are then
moved into the repo-settings struct, with new accessors functions
written to modify them, and calls to the accessors in the path subsystem are
then modified to replace the old accessors which modify the global variables.

There were also cases where the functions were adapted to use exactly what it
needs down the call chain rather than writing new accessor functions.
An example is [3], where the global variable `the_hash_algo` is replaced with
an explicit parameter `const struct git_hash_algo *algo` in low-level
functions such as `static struct hashfile *hashfd_internal()` and the call
sites adapted to use r->hash_algo or the_repository->hash_algo in places where
the subsystem has not gotten rid of `the_repository`.
This is also a strategy that can be used to replace global variables


Completion of Previous Unfinished Works
---------------------------------------
There were also some pieces of work that were started but not finished which
I plan to finish.
* As an example, in [4], which attempts to move the `git_attributes_file`
   global variable to the `struct repository`.
   However because the global variable is used by the attributes subsystem and
   a single repository can have more than one set of attributes, that is
   the work-tree attributes and the index attributes, placing the variable into
   a repository instance and passing it around in the call chain will not be
   appropriate. Also most of the functions in the attributes subsystem pass the
   `index_state` as a parameter and not the repository. This is because an index
   knows its repository but a repository only knows its primary index.
   Therefore each repository for an index will need to be known from the index.

   As Junio pointed out in the discussion on the thread:
   "As the attribute system is all about giving extra information on the
   paths that appear in the index and in the working tree, it may make
   sense for the API to go from the index state which is about the
   index and the working tree to access the attributes, rather than
   from the repository structure, which controls a lot wider concept
   and moving anything and everything there will easily and quickly
   make it a messy kitchen sink."

   So Given that the `index_state` struct has a repo member, we can move
   'git_attributes_file' into the repo struct but access it through the
   `index_state`. By doing that we know the index truly owns the attributes.

*  There is also `is_bare_repository_cfg` as seen in [5].
   I have only skimmed through the discussions and patches to understand why it
   was not finished.
   But I will do an in depth study to understand why it was not completed and what
   it takes to finish it.


Proposed Project Execution Timeline
===================================

1. Study Code Base To Identify Suitable Candidates (Now - December 8, 2025):
------------------------------------------------------------------------
- The first step will be familiarising myself with the code base to
   understand how these global variables in environment.c are initialised,
   used and how they interact with other subsystems.

2. Community Feedback Bonding ( December 9 - December 15, 2025):
------------------------------------------------------------
- Discuss environment variables with mentors and community members
- Understand best refactoring approach based on feedback from mentors

3. Review Existing Patch and Define Criteria (December 16 - January 9, 2026):
-------------------------------------------------------------
- Thoroughly examine the existing patch series submitted to the mailing
    list  to understand;
    * What criteria makes a global variable a suitable candidate to be
       moved to the `struct repository` or `struct repo-settings`
    * What appropriate context it should be moved into based on its
       interactions with other subsystems.
    * If remaining a global variable is the best approach in its case.
- This information can be gotten by paying attention to the discussions
  in the patches and also engaging with my mentors and the Git community.

To buttress the above points from my brief study of previous work,
if the variable value is:
i. meant to be different for different repositories, it is a candidate to move,
   if not then it is left as is, like the case of `local_repo_env[]`.

ii. used during early startup, it cannot be moved blindly but will need
    a closer inspection and refactoring of the startup code as is the case with
    `have_git_dir()` noted by Patrick and Shejialuo in [7].

Its relationship with other subsystems is also a criteria to define
such as the case of `git_attributes_file mentioned` above

4. Implement Candidates and Submit PRs ( January 10 - February 28, 2026):
--------------------------------------------------------------------------
- With collaboration from mentors and the Git community, identify
  suitable candidates for relocation.
- Relocate them into `struct repository`, `struct repo-settings` and
  other appropriate contexts.
- Pass the repository parameter to accessor functions to replace the
  global dependence
- Write new accessor functions if necessary else pass context directly to
  functions.
- Modify accessor callers to reflect the new changes while ensuring
  all affected code paths works correctly
- Update tests and documentations
- Recursively submit patches for reviews, engaging in discussions and
  implement suggestions

5. Final Report on Project (February 29 - March 6)
--------------------------------
- Document final report in my blog with details on my experience
- Finalize any pending tasks or reviews on any submitted patch

Availability
============
I am currently not enrolled in any school or jobs, so I will be able to give
30 hours a week or more to make the project a success.

Blogging
=========
I have set up my blog where I will document my progress, insights,
challenges and experience weekly.

Post Outreachy
==============
The welcoming and patient atmosphere during this short contribution
period with the Git
community has made me want to keep getting involved with the
community. I am committed to
continuously contributing to Git and become a part of of the next set
of contributors
to champion the continuous development of Git.

Appreciation
============
To Junio and Christian, I really appreciate your guidance, patience
and direction while
reviewing and helping with my patches and to Usman for your inputs and to every
member of the Git community, I thank you all.


References
==========
[1]: https://public-inbox.org/git/20250207-b4-pks-path-drop-the-repository-v2-14-13cad3c11b8a@pks.im/#Z31config.c
[2]: https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-15-4e77f0313206@pks.im/
[3]: https://public-inbox.org/git/20250306-b4-pks-objects-without-the-repository-v2-1-f3465327be69@pks.im/#Z31csum-file.h
[4]: https://lore.kernel.org/git/20250309153321.254844-1-ayu.chandekar@gmail.com/
[5]: https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/
[6]: https://lore.kernel.org/git/d0e2042b3061320fac8a8fdf9043c6ab4dbed5a2.1752882401.git.ayu.chandekar@gmail.com/
[7]: https://lore.kernel.org/git/c82620a1f54ea6760bff204fd2b5fe5c2df1896c.1753804956.git.ayu.chandekar@gmail.com/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Outreachy][Proposal]: Refactor in order to reduce Git’s global state
  2025-10-30 12:55     ` Christian Couder
@ 2025-10-31 11:43       ` Bello Olamide
  0 siblings, 0 replies; 7+ messages in thread
From: Bello Olamide @ 2025-10-31 11:43 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Usman Akinyemi

On Thu, 30 Oct 2025 at 13:55, Christian Couder
<christian.couder@gmail•com> wrote:
>
> > Yes there were cases where the functions were adapted to use
> > exactly what it needs down the call chain rather than writing new
> > accessor functions.
> > An example is
> > https://public-inbox.org/git/20250306-b4-pks-objects-without-the-repository-v2-1-f3465327be69@pks.im/#Z31csum-file.h
> > where the global variable `the_hash_algo` is replaced with an explicit parameter
> > `const struct git_hash_algo *algo` in low-level functions such as
> > `static struct hashfile *hashfd_internal()` and the call sites adapted
> > to use r->hash_algo
> > or the_repository->hash_algo in places where the subsystem has not gotten rid of
> > `the-repository`.
> >
> > This is also a strategy that can be used to replace global variables.
>
> Your answers are appreciated, but, just to be clear, I think it would
> be nice if the answers to my questions like this one were part of a v2
> of your proposal. If I don't see a v2, I am less tempted to discuss
> this further (which could hopefully help move the analysis forward and
> make your proposal better).
>
> Thanks.

Hello Christian
Thank you very much.
I have added the answers to your question and submitted a v2 of the
proposal.

Bello

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Outreachy][Proposal v2]: Refactor in order to reduce Git’s global state
  2025-10-30 14:49 ` =?y?q?=5BOutreachy=5D=5BProposal=20v2=5D=3A=20Refactor=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Olamide Caleb Bello
@ 2025-11-01 19:13   ` Bello Olamide
  0 siblings, 0 replies; 7+ messages in thread
From: Bello Olamide @ 2025-11-01 19:13 UTC (permalink / raw)
  To: git; +Cc: christian.couder

On Thu, 30 Oct 2025 at 15:49, Olamide Caleb Bello <belkid98@gmail•com> wrote:
>
> Hello,
> This is the second iteration on my proposal for the project
> "Refactor in order to reduce Git’s global state" for the 2025 Outreachy
> Internship program.
>
> The changes from v1 includes answers to questions from Christian on other
> refactoring strategies used asides writing new accessors, unfinished previous
> works and the roadblocks encountered.
>
> Personal Bio:
> ===========
> Full Name: Bello Caleb Olamide
> Email: belkid98@gmail•com
> Personal Blog: https://cloobtech.hashnode.dev/
> GitHub: https://github.com/cloobtech
>
> About Me:
> =========
> I'm Bello Olamide. I am passionate about software engineering and
> I love to figure out things. I like participating in tech
> events such as hackathons but this will be my first open source experience
> and I have relished the opportunity and experience so far.
> I love being part of a community that strive to achieve a goal and one that
> I found myself is a small albeit growing community that helps to guide and
> mentor younger boys find their way into the tech ecosystem. I have developed
> my coding skill via various sources including personal learning, freelancing,
> collaboration with other developers and from the ALX Software Engineering
> program.
>
> Past Experience with Git:
> ===================
> I have been a Git user for sometime now majorly for collaborating with other
> developers, tracking version changes to files and during this contribution
> stage, I have understood the ropes of how to send patches to Git.
>
> Contributions to the Git Community:
> ==========================
> I have been able to send some patches to the Git codebase with the guidance
> and direction of community members.
>
> Microproject:
> ==========
> Link: https://lore.kernel.org/git/cover.1761217100.git.belkid98@gmail.com/
> Branch: ob/gpg-interface-cleanup
> Status: Merged to next
> Commit ID: ce6d041635
> Description: strbuf_split*() to split a string into multiple strbufs
> is often a wrong API to use.
> A few uses of it have been removed by simplifying the code.
>
> Project Overview
> =============
> Git uses a single global `struct repository` object called `the_repository`
> which internal functions rely on to store, access and modify environment
> and configuration variables.
> With this approach, multi-repository instances running in the same process
> can lead to inconsistent behaviours and race conditions.
> By refactoring the code to stop storing repository-scoped
> configurations in global variables in
> `environment.c file`, that is by moving the appropriate global
> variables into localised state
> within the `struct repository` and `struct repo-settings`, the
> codebase becomes more maintainable,
> easier to test and future work such as libifying Git becomes feasible.
>
> Internship Objectives and Plans
> ========================
> The project aims to identify repository scoped global variables in
> `environment.c` and related files that can be moved to local scope within
> `structrepository` and `struct repo-settings`, find an appropriate strategy
> to move them to local scope and implement the changes. This architectural
> improvement will make the codebase more maintainable and enable better
> multi-repository handling in the future.
>
> From a high level overview, environment.[ch] exposes some global
> variables that reflect a per-repository state and examples of such include
> git_work_tree_cfg, is_bare_repository_cfg, and core.* settings and functions
> which also depend on `the_repository` such as have_git_dir(),
> is_bare_repository().
>
> Review of Previous Work and Refactor Stategies:
> ===============================================
> After a brief study of some related work done on the project,
> it is important to understand the purpose of the identified global variable
> and how it is used across the code base, observing how it relates with other
> subsystems and moving it to the `struct repository` or `struct
> repo-settings` if its use is repository specific, or specify an appropriate
> context based on its scopeand use this context in the accessor functions.
> For example in [1], Patrick Steinhardt observes that `core.hooksPath`
> is repository specific and is stored in the global variable `git_hooks_path`.
> The variable is then moved into local scope in the repo-settings
> struct and a new accessor function `repo_settings_get_hooks_path()` is written
> and used to set the `hooks_path` of the repo specific struct which the path
> subsystem reads from.
>
> Similarly in [2], `core.sharedRepository` is tracked via the global variables
> `the_shared_repository ` and `need_shared_repository`. These are then
> moved into the repo-settings struct, with new accessors functions
> written to modify them, and calls to the accessors in the path subsystem are
> then modified to replace the old accessors which modify the global variables.
>
> There were also cases where the functions were adapted to use exactly what it
> needs down the call chain rather than writing new accessor functions.
> An example is [3], where the global variable `the_hash_algo` is replaced with
> an explicit parameter `const struct git_hash_algo *algo` in low-level
> functions such as `static struct hashfile *hashfd_internal()` and the call
> sites adapted to use r->hash_algo or the_repository->hash_algo in places where
> the subsystem has not gotten rid of `the_repository`.
> This is also a strategy that can be used to replace global variables
>
>
> Completion of Previous Unfinished Works
> ---------------------------------------
> There were also some pieces of work that were started but not finished which
> I plan to finish.
> * As an example, in [4], which attempts to move the `git_attributes_file`
>    global variable to the `struct repository`.
>    However because the global variable is used by the attributes subsystem and
>    a single repository can have more than one set of attributes, that is
>    the work-tree attributes and the index attributes, placing the variable into
>    a repository instance and passing it around in the call chain will not be
>    appropriate. Also most of the functions in the attributes subsystem pass the
>    `index_state` as a parameter and not the repository. This is because an index
>    knows its repository but a repository only knows its primary index.
>    Therefore each repository for an index will need to be known from the index.
>
>    As Junio pointed out in the discussion on the thread:
>    "As the attribute system is all about giving extra information on the
>    paths that appear in the index and in the working tree, it may make
>    sense for the API to go from the index state which is about the
>    index and the working tree to access the attributes, rather than
>    from the repository structure, which controls a lot wider concept
>    and moving anything and everything there will easily and quickly
>    make it a messy kitchen sink."
>
>    So Given that the `index_state` struct has a repo member, we can move
>    'git_attributes_file' into the repo struct but access it through the
>    `index_state`. By doing that we know the index truly owns the attributes.
>
> *  There is also `is_bare_repository_cfg` as seen in [5].
>    I have only skimmed through the discussions and patches to understand why it
>    was not finished.
>    But I will do an in depth study to understand why it was not completed and what
>    it takes to finish it.
>
>
> Proposed Project Execution Timeline
> ===================================
>
> 1. Study Code Base To Identify Suitable Candidates (Now - December 8, 2025):
> ------------------------------------------------------------------------
> - The first step will be familiarising myself with the code base to
>    understand how these global variables in environment.c are initialised,
>    used and how they interact with other subsystems.
>
> 2. Community Feedback Bonding ( December 9 - December 15, 2025):
> ------------------------------------------------------------
> - Discuss environment variables with mentors and community members
> - Understand best refactoring approach based on feedback from mentors
>
> 3. Review Existing Patch and Define Criteria (December 16 - January 9, 2026):
> -------------------------------------------------------------
> - Thoroughly examine the existing patch series submitted to the mailing
>     list  to understand;
>     * What criteria makes a global variable a suitable candidate to be
>        moved to the `struct repository` or `struct repo-settings`
>     * What appropriate context it should be moved into based on its
>        interactions with other subsystems.
>     * If remaining a global variable is the best approach in its case.
> - This information can be gotten by paying attention to the discussions
>   in the patches and also engaging with my mentors and the Git community.
>
> To buttress the above points from my brief study of previous work,
> if the variable value is:
> i. meant to be different for different repositories, it is a candidate to move,
>    if not then it is left as is, like the case of `local_repo_env[]`.
>
> ii. used during early startup, it cannot be moved blindly but will need
>     a closer inspection and refactoring of the startup code as is the case with
>     `have_git_dir()` noted by Patrick and Shejialuo in [7].
>
> Its relationship with other subsystems is also a criteria to define
> such as the case of `git_attributes_file mentioned` above
>
> 4. Implement Candidates and Submit PRs ( January 10 - February 28, 2026):
> --------------------------------------------------------------------------
> - With collaboration from mentors and the Git community, identify
>   suitable candidates for relocation.
> - Relocate them into `struct repository`, `struct repo-settings` and
>   other appropriate contexts.
> - Pass the repository parameter to accessor functions to replace the
>   global dependence
> - Write new accessor functions if necessary else pass context directly to
>   functions.
> - Modify accessor callers to reflect the new changes while ensuring
>   all affected code paths works correctly
> - Update tests and documentations
> - Recursively submit patches for reviews, engaging in discussions and
>   implement suggestions
>
> 5. Final Report on Project (February 29 - March 6)
> --------------------------------
> - Document final report in my blog with details on my experience
> - Finalize any pending tasks or reviews on any submitted patch
>
> Availability
> ============
> I am currently not enrolled in any school or jobs, so I will be able to give
> 30 hours a week or more to make the project a success.
>
> Blogging
> =========
> I have set up my blog where I will document my progress, insights,
> challenges and experience weekly.
>
> Post Outreachy
> ==============
> The welcoming and patient atmosphere during this short contribution
> period with the Git
> community has made me want to keep getting involved with the
> community. I am committed to
> continuously contributing to Git and become a part of of the next set
> of contributors
> to champion the continuous development of Git.
>
> Appreciation
> ============
> To Junio and Christian, I really appreciate your guidance, patience
> and direction while
> reviewing and helping with my patches and to Usman for your inputs and to every
> member of the Git community, I thank you all.
>
>
> References
> ==========
> [1]: https://public-inbox.org/git/20250207-b4-pks-path-drop-the-repository-v2-14-13cad3c11b8a@pks.im/#Z31config.c
> [2]: https://public-inbox.org/git/20250206-b4-pks-path-drop-the-repository-v1-15-4e77f0313206@pks.im/
> [3]: https://public-inbox.org/git/20250306-b4-pks-objects-without-the-repository-v2-1-f3465327be69@pks.im/#Z31csum-file.h
> [4]: https://lore.kernel.org/git/20250309153321.254844-1-ayu.chandekar@gmail.com/
> [5]: https://public-inbox.org/git/pull.1826.git.git.1730926082.gitgitgadget@gmail.com/
> [6]: https://lore.kernel.org/git/d0e2042b3061320fac8a8fdf9043c6ab4dbed5a2.1752882401.git.ayu.chandekar@gmail.com/
> [7]: https://lore.kernel.org/git/c82620a1f54ea6760bff204fd2b5fe5c2df1896c.1753804956.git.ayu.chandekar@gmail.com/

Hello Christian
Please kindly refer to v3.
I noticed the subject did not have the correct format on the mailing list.
Something went wrong when I was used git send-email

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-11-01 19:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29  1:18 [Outreachy][Proposal]: Refactor in order to reduce Git’s global state Bello Olamide
2025-10-29 15:51 ` Christian Couder
2025-10-30 10:59   ` Bello Olamide
2025-10-30 12:55     ` Christian Couder
2025-10-31 11:43       ` Bello Olamide
2025-10-30 14:49 ` =?y?q?=5BOutreachy=5D=5BProposal=20v2=5D=3A=20Refactor=20in=20order=20to=20reduce=20Git=E2=80=99s=20global=20state?= Olamide Caleb Bello
2025-11-01 19:13   ` [Outreachy][Proposal v2]: Refactor in order to reduce Git’s global state Bello Olamide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox