* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory @ 2025-11-28 5:21 Natee Korn 0 siblings, 0 replies; 6+ messages in thread From: Natee Korn @ 2025-11-28 5:21 UTC (permalink / raw) To: karthik.188; +Cc: git, gitster, jltobler, jn.avila, sunshine, toon 2.. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 0/2] refs: allow setting the reference directory
@ 2025-11-26 11:11 Karthik Nayak
2025-11-26 11:12 ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak
0 siblings, 1 reply; 6+ messages in thread
From: Karthik Nayak @ 2025-11-26 11:11 UTC (permalink / raw)
To: git; +Cc: Karthik Nayak, jltobler, gitster, toon, sunshine,
Jean-Noël Avila
While Git allows users to select different reference backends, unlike
with objects, there is no flexibility in selecting the reference
directory. Currently, the reference format is obtained from the config
of the repository and the reference directory is set to the $GIT_DIR.
This patch series adds a new ENV variable 'GIT_REF_URI' which takes the
reference backend and path in a URI form:
<reference_backend>://<URI-for-resource>
For e.g. 'reftable:///foo' or 'files://$GIT_DIR/ref_migration.0xBsa0'.
One use case for this is migration between different backends. On the
server side, migrating from the files backend to the newly introduced
reftable backend can be achieved by running 'git refs migrate'. However,
for large repositories with millions of references, this migration can
take from seconds to minutes.
For some background, at GitLab, the criteria for our migration was to
reduce the downtime of the migrate ideally to zero. So running 'git refs
migrate --ref-format=reftable' by itself wouldn't work, since it scales
with the number of references and we have repos with millions of
references, so we need to migrate without loosing any information. We
came up with the following plan:
1. Run git-pack-refs(1) and note timestamp of the generated packed-refs
file.
2. Run git refs migrate –dry-run.
3. If there are no ongoing reference requests (read/write)
a. Lock the repository by blocking incoming requests (done on a
layer above git, in Gitaly [1]).
b. If the timestamp of the packed-refs file has changed, unlock
the repo and repeat from step 1.
c. Apply all the loose refs to the dry-run reftable folder (this
requires support in Git to write refs to arbitrary folder).
d. Move the reftable dry-run folder into the GIT_DIR.
e. Swap the repo config
f. Unlock repo access
Using such a route, scales much better since we only have to worry about
blocking the repository by O(ref written between #1 and #3a) and not
O(refs in repo). But for doing so, we need to be able to write to a
arbitrary reference backend + path. This is to add the missing
references to the dry-run reftable folder. This series, achieves that.
The first commit adds the required changes to create a 'ref_store' for a
given path. The second commit parses the URI if available when creating
the main ref store.
This is based on top of 9a2fb147f2 (Git 2.52, 2025-11-17).
[1]: https://gitlab.com/gitlab-org/gitaly
---
Changes in v2:
- Added more clarification and proper intent in the cover message.
- Changed the format from '<ref_backend>://<path>' to
`<ref_backend>://<URI-for-resource>` as it much clearer.
- Added logic to check for the '//' in the provided URI and a test for
the same.
- In the tests:
- Use test_must_fail() instead of ! git
- Fix looped tests not using the variables correctly and ensure that
the test description is correct.
- Link to v1: https://patch.msgid.link/20251119-kn-alternate-ref-dir-v1-0-4cf4a94c8bed@gmail.com
---
Documentation/git.adoc | 8 ++++
environment.h | 1 +
refs.c | 71 +++++++++++++++++++++++++++--
t/meson.build | 1 +
t/t1423-ref-backend.sh | 121 +++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 199 insertions(+), 3 deletions(-)
Karthik Nayak (2):
refs: support obtaining ref_store for given dir
refs: add GIT_REF_URI to specify reference backend and directory
Range-diff versus v1:
1: f6e8aa37fe ! 1: c925726efd refs: support obtaining ref_store for given dir
@@ Commit message
The refs subsystem uses the `get_main_ref_store()` to obtain the main
ref_store for a given repository. In the upcoming patches we also want
to create a ref_store for any given reference directory, which may exist
- in arbitrary paths. To support such behavior, extract out the core logic
- for creating out the ref_store from `get_main_ref_store()` into a new
- function `get_ref_store_for_dir()` which can provide the ref_store for a
+ in arbitrary paths. For the files backend and the reftable backend, the
+ reference directory is generally the $GIT_DIR.
+
+ To support such behavior, extract out the core logic for creating out
+ the ref_store from `get_main_ref_store()` into a new function
+ `get_ref_store_for_dir()` which can provide the ref_store for a
given (repository, directory, reference format) combination.
Signed-off-by: Karthik Nayak <karthik.188@gmail•com>
2: 5e30fa334e ! 2: b859ebad64 refs: add GIT_REF_URI to specify reference backend and directory
@@ Commit message
Add a new environment variable 'GIT_REF_URI' that specifies both the
reference backend and directory path using a URI format:
- <ref_backend>://<path>
+ <ref_backend>://<URI-for-resource>
When set, this variable is used to obtain the main reference store for
all Git commands. The variable is checked in `get_main_ref_store()`
@@ Commit message
Add a new test file 't1423-ref-backend.sh' to test this environment
variable.
+ Helped-by: Jean-Noël Avila <jn.avila@free•fr>
Signed-off-by: Karthik Nayak <karthik.188@gmail•com>
## Documentation/git.adoc ##
@@ Documentation/git.adoc: double-quotes and respecting backslash escapes. E.g., th
See `--ref-format` in linkgit:git-init[1].
+`GIT_REF_URI`::
-+ Specify which reference backend and path to be used, if not specified the
-+ backend is inferred from the configuration and $GIT_DIR is used as the
-+ path.
++ Specify which reference backend to be used along with its URI. Reference
++ backends like the files, reftable backend use the $GIT_DIR as their URI.
++
-+Expects the format '<ref_backend>://<path>', where the 'backend' specifies the
-+reference backend and the 'path' specifies the directory used by the backend.
++Expects the format `<ref_backend>://<URI-for-resource>`, where the
++_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_
++specifies the URI used by the backend.
+
Git Commits
~~~~~~~~~~~
@@ refs.c: static struct ref_store *get_ref_store_for_dir(struct repository *r,
+ }
+
+ format_string = ref_backend_info.items[0].string;
++ if (!starts_with(ref_backend_info.items[1].string, "//")) {
++ error("invalid reference backend uri format '%s'", uri);
++ goto cleanup;
++ }
++ dir = ref_backend_info.items[1].string + 2;
++
++ format_string = ref_backend_info.items[0].string;
+ dir = ref_backend_info.items[1].string + 2;
+
+ if (!dir || !dir[0]) {
@@ t/t1423-ref-backend.sh (new)
+ cd repo &&
+ GIT_REF_URI="" &&
+ export GIT_REF_URI &&
-+ ! git refs list 2>err &&
++ test_must_fail git refs list 2>err &&
+ test_grep "reference backend uri is empty" err
+ )
+'
@@ t/t1423-ref-backend.sh (new)
+ cd repo &&
+ GIT_REF_URI="reftable@/home/reftable" &&
+ export GIT_REF_URI &&
-+ ! git refs list 2>err &&
++ test_must_fail git refs list 2>err &&
+ test_grep "invalid reference backend uri format" err
+ )
+'
@@ t/t1423-ref-backend.sh (new)
+ cd repo &&
+ GIT_REF_URI="reftable://" &&
+ export GIT_REF_URI &&
-+ ! git refs list 2>err &&
++ test_must_fail git refs list 2>err &&
+ test_grep "invalid path in uri" err
+ )
+'
+
++test_expect_success 'uri ends at colon' '
++ test_when_finished "rm -rf repo" &&
++ git init --ref-format=files repo &&
++ (
++ cd repo &&
++ GIT_REF_URI="reftable:" &&
++ export GIT_REF_URI &&
++ test_must_fail git refs list 2>err &&
++ test_grep "invalid reference backend uri format" err
++ )
++'
++
+test_expect_success 'unknown reference backend' '
+ test_when_finished "rm -rf repo" &&
+ git init --ref-format=files repo &&
@@ t/t1423-ref-backend.sh (new)
+ cd repo &&
+ GIT_REF_URI="db://.git" &&
+ export GIT_REF_URI &&
-+ ! git refs list 2>err &&
++ test_must_fail git refs list 2>err &&
+ test_grep "unknown reference backend" err
+ )
+'
@@ t/t1423-ref-backend.sh (new)
+ continue
+ fi
+
-+ test_expect_success 'read from other reference backend' '
++ test_expect_success "read from $to_format backend" '
+ test_when_finished "rm -rf repo" &&
-+ git init --ref-format=files repo &&
++ git init --ref-format=$from_format repo &&
+ (
+ cd repo &&
+ test_commit 1 &&
+ test_commit 2 &&
+ test_commit 3 &&
+
-+ git refs migrate --dry-run --ref-format=reftable >out &&
-+ REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
++ git refs migrate --dry-run --ref-format=$to_format >out &&
++ BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
+ git refs list >expect &&
-+ GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >actual &&
++ GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >actual &&
+ test_cmp expect actual
+ )
+ '
+
-+ test_expect_success 'write to other reference backend' '
++ test_expect_success "write to $to_format backend" '
+ test_when_finished "rm -rf repo" &&
-+ git init --ref-format=files repo &&
++ git init --ref-format=$from_format repo &&
+ (
+ cd repo &&
+ test_commit 1 &&
+ test_commit 2 &&
+ test_commit 3 &&
+
-+ git refs migrate --dry-run --ref-format=reftable >out &&
++ git refs migrate --dry-run --ref-format=$to_format >out &&
+ git refs list >expect &&
+
-+ REFTABLE_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
-+ GIT_REF_URI="reftable://$REFTABLE_PATH" git tag -d 1 &&
++ BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") &&
++ GIT_REF_URI="$to_format://$BACKEND_PATH" git tag -d 1 &&
+
+ git refs list >actual &&
+ test_cmp expect actual &&
+
-+ GIT_REF_URI="reftable://$REFTABLE_PATH" git refs list >expect &&
++ GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >expect &&
+ git refs list >out &&
+ cat out | grep -v "refs/tags/1" >actual &&
+ test_cmp expect actual
base-commit: 9a2fb147f2c61d0cab52c883e7e26f5b7948e3ed
change-id: 20251105-kn-alternate-ref-dir-3e572e8cd0ef
Thanks
- Karthik
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory 2025-11-26 11:11 [PATCH v2 0/2] refs: allow setting the reference directory Karthik Nayak @ 2025-11-26 11:12 ` Karthik Nayak 2025-11-26 16:17 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Karthik Nayak @ 2025-11-26 11:12 UTC (permalink / raw) To: git; +Cc: Karthik Nayak, jltobler, gitster, toon, sunshine, Jean-Noël Avila Git allows setting a different object directory via 'GIT_OBJECT_DIRECTORY', but provides no equivalent for references. This asymmetry makes it difficult to test different reference backends or use alternative reference storage locations without modifying the repository structure. Add a new environment variable 'GIT_REF_URI' that specifies both the reference backend and directory path using a URI format: <ref_backend>://<URI-for-resource> When set, this variable is used to obtain the main reference store for all Git commands. The variable is checked in `get_main_ref_store()` when lazily assigning `repo->refs_private`. We cannot initialize this earlier in `repo_set_gitdir()` because the repository's hash algorithm isn't known at that point, and the reftable backend requires this information during initialization. When used with worktrees, the specified directory is treated as the reference directory for all worktree operations. Add a new test file 't1423-ref-backend.sh' to test this environment variable. Helped-by: Jean-Noël Avila <jn.avila@free•fr> Signed-off-by: Karthik Nayak <karthik.188@gmail•com> --- Documentation/git.adoc | 8 ++++ environment.h | 1 + refs.c | 60 +++++++++++++++++++++++- t/meson.build | 1 + t/t1423-ref-backend.sh | 121 +++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 190 insertions(+), 1 deletion(-) diff --git a/Documentation/git.adoc b/Documentation/git.adoc index ce099e78b8..8c6a3f6042 100644 --- a/Documentation/git.adoc +++ b/Documentation/git.adoc @@ -584,6 +584,14 @@ double-quotes and respecting backslash escapes. E.g., the value repositories will be set to this value. The default is "files". See `--ref-format` in linkgit:git-init[1]. +`GIT_REF_URI`:: + Specify which reference backend to be used along with its URI. Reference + backends like the files, reftable backend use the $GIT_DIR as their URI. ++ +Expects the format `<ref_backend>://<URI-for-resource>`, where the +_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_ +specifies the URI used by the backend. + Git Commits ~~~~~~~~~~~ `GIT_AUTHOR_NAME`:: diff --git a/environment.h b/environment.h index 51898c99cd..9bc380bba4 100644 --- a/environment.h +++ b/environment.h @@ -42,6 +42,7 @@ #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS" #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR" #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE" +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI" /* * Environment variable used to propagate the --no-advice global option to the diff --git a/refs.c b/refs.c index 23f46867f2..a7af228799 100644 --- a/refs.c +++ b/refs.c @@ -2186,15 +2186,73 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r, return maybe_debug_wrap_ref_store(dir, ref_store); } +static struct ref_store *get_ref_store_from_uri(struct repository *repo, + const char *uri) +{ + struct string_list ref_backend_info = STRING_LIST_INIT_DUP; + enum ref_storage_format format; + struct ref_store *store = NULL; + char *format_string; + char *dir; + + if (!uri || !uri[0]) { + error("reference backend uri is empty"); + goto cleanup; + } + + if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) { + error("invalid reference backend uri format '%s'", uri); + goto cleanup; + } + + format_string = ref_backend_info.items[0].string; + if (!starts_with(ref_backend_info.items[1].string, "//")) { + error("invalid reference backend uri format '%s'", uri); + goto cleanup; + } + dir = ref_backend_info.items[1].string + 2; + + format_string = ref_backend_info.items[0].string; + dir = ref_backend_info.items[1].string + 2; + + if (!dir || !dir[0]) { + error("invalid path in uri '%s'", uri); + goto cleanup; + } + + format = ref_storage_format_by_name(format_string); + if (format == REF_STORAGE_FORMAT_UNKNOWN) { + error("unknown reference backend '%s'", format_string); + goto cleanup; + } + + store = get_ref_store_for_dir(repo, dir, format); + +cleanup: + string_list_clear(&ref_backend_info, 0); + return store; +} + struct ref_store *get_main_ref_store(struct repository *r) { + char *ref_uri; + if (r->refs_private) return r->refs_private; if (!r->gitdir) BUG("attempting to get main_ref_store outside of repository"); - r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format); + ref_uri = getenv(GIT_REF_URI_ENVIRONMENT); + if (ref_uri) { + r->refs_private = get_ref_store_from_uri(r, ref_uri); + if (!r->refs_private) + die("failed to initialize ref store from URI: %s", ref_uri); + + } else { + r->refs_private = get_ref_store_for_dir(r, r->gitdir, + r->ref_storage_format); + } return r->refs_private; } diff --git a/t/meson.build b/t/meson.build index a5531df415..a66f8fafff 100644 --- a/t/meson.build +++ b/t/meson.build @@ -208,6 +208,7 @@ integration_tests = [ 't1420-lost-found.sh', 't1421-reflog-write.sh', 't1422-show-ref-exists.sh', + 't1423-ref-backend.sh', 't1430-bad-ref-name.sh', 't1450-fsck.sh', 't1451-fsck-buffer.sh', diff --git a/t/t1423-ref-backend.sh b/t/t1423-ref-backend.sh new file mode 100755 index 0000000000..f6756bdd2b --- /dev/null +++ b/t/t1423-ref-backend.sh @@ -0,0 +1,121 @@ +#!/bin/sh + +test_description='Test different reference backend URIs' + +. ./test-lib.sh + +test_expect_success 'empty uri provided' ' + test_when_finished "rm -rf repo" && + git init --ref-format=files repo && + ( + cd repo && + GIT_REF_URI="" && + export GIT_REF_URI && + test_must_fail git refs list 2>err && + test_grep "reference backend uri is empty" err + ) +' + +test_expect_success 'invalid uri provided' ' + test_when_finished "rm -rf repo" && + git init --ref-format=files repo && + ( + cd repo && + GIT_REF_URI="reftable@/home/reftable" && + export GIT_REF_URI && + test_must_fail git refs list 2>err && + test_grep "invalid reference backend uri format" err + ) +' + +test_expect_success 'empty path in uri' ' + test_when_finished "rm -rf repo" && + git init --ref-format=files repo && + ( + cd repo && + GIT_REF_URI="reftable://" && + export GIT_REF_URI && + test_must_fail git refs list 2>err && + test_grep "invalid path in uri" err + ) +' + +test_expect_success 'uri ends at colon' ' + test_when_finished "rm -rf repo" && + git init --ref-format=files repo && + ( + cd repo && + GIT_REF_URI="reftable:" && + export GIT_REF_URI && + test_must_fail git refs list 2>err && + test_grep "invalid reference backend uri format" err + ) +' + +test_expect_success 'unknown reference backend' ' + test_when_finished "rm -rf repo" && + git init --ref-format=files repo && + ( + cd repo && + GIT_REF_URI="db://.git" && + export GIT_REF_URI && + test_must_fail git refs list 2>err && + test_grep "unknown reference backend" err + ) +' + +ref_formats="files reftable" +for from_format in $ref_formats +do + for to_format in $ref_formats + do + if test "$from_format" = "$to_format" + then + continue + fi + + test_expect_success "read from $to_format backend" ' + test_when_finished "rm -rf repo" && + git init --ref-format=$from_format repo && + ( + cd repo && + test_commit 1 && + test_commit 2 && + test_commit 3 && + + git refs migrate --dry-run --ref-format=$to_format >out && + BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") && + git refs list >expect && + GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >actual && + test_cmp expect actual + ) + ' + + test_expect_success "write to $to_format backend" ' + test_when_finished "rm -rf repo" && + git init --ref-format=$from_format repo && + ( + cd repo && + test_commit 1 && + test_commit 2 && + test_commit 3 && + + git refs migrate --dry-run --ref-format=$to_format >out && + git refs list >expect && + + BACKEND_PATH=$(cat out | sed "s/.* ${SQ}\(.*\)${SQ}/\1/") && + GIT_REF_URI="$to_format://$BACKEND_PATH" git tag -d 1 && + + git refs list >actual && + test_cmp expect actual && + + GIT_REF_URI="$to_format://$BACKEND_PATH" git refs list >expect && + git refs list >out && + cat out | grep -v "refs/tags/1" >actual && + test_cmp expect actual + ) + ' + done +done + +test_done -- 2.51.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory 2025-11-26 11:12 ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak @ 2025-11-26 16:17 ` Junio C Hamano 2025-11-27 14:52 ` Karthik Nayak 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2025-11-26 16:17 UTC (permalink / raw) To: Karthik Nayak; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila Karthik Nayak <karthik.188@gmail•com> writes: > +`GIT_REF_URI`:: > + Specify which reference backend to be used along with its URI. Reference > + backends like the files, reftable backend use the $GIT_DIR as their URI. > ++ > +Expects the format `<ref_backend>://<URI-for-resource>`, where the > +_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_ > +specifies the URI used by the backend. It is more like "<directory>" that specifies the local directory the backend is told to use to store its data. It feels way too broad for what the initial implementation achieves and what the design can potentially include, to say "URI-for-resource", I would think. > diff --git a/environment.h b/environment.h > index 51898c99cd..9bc380bba4 100644 > --- a/environment.h > +++ b/environment.h > @@ -42,6 +42,7 @@ > #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS" > #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR" > #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE" > +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI" > > /* > * Environment variable used to propagate the --no-advice global option to the > diff --git a/refs.c b/refs.c > index 23f46867f2..a7af228799 100644 > --- a/refs.c > +++ b/refs.c > @@ -2186,15 +2186,73 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r, > return maybe_debug_wrap_ref_store(dir, ref_store); > } > > +static struct ref_store *get_ref_store_from_uri(struct repository *repo, > + const char *uri) > +{ > + struct string_list ref_backend_info = STRING_LIST_INIT_DUP; > + enum ref_storage_format format; > + struct ref_store *store = NULL; > + char *format_string; > + char *dir; > + > + if (!uri || !uri[0]) { > + error("reference backend uri is empty"); > + goto cleanup; > + } Equating !uri and !uri[0] and giving the same message would not help diagnosing an error, and not _("localizing") the message is of dubious value (after all, the message is not being given to somebody coming over the network, but meant to be given to the local user, right?). If we remove the !uri[0] from the check, shouldn't the later check catch it as "invalid format" anyway, and print '%s' it to show that what was given was empty clearly enough? > + if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) { > + error("invalid reference backend uri format '%s'", uri); > + goto cleanup; > + } > + > + format_string = ref_backend_info.items[0].string; > + if (!starts_with(ref_backend_info.items[1].string, "//")) { > + error("invalid reference backend uri format '%s'", uri); > + goto cleanup; > + } > + dir = ref_backend_info.items[1].string + 2; Two questions. (1) do we still want the double-slash after the colon? (2) if so, would it make it simpler to string-list-split using "://" as the separator? > + format_string = ref_backend_info.items[0].string; > + dir = ref_backend_info.items[1].string + 2; These two lines are fishy. Perhaps leftover from an earlier draft that did not have an error checking before the previous 5 lines were added? > + if (!dir || !dir[0]) { > + error("invalid path in uri '%s'", uri); > + goto cleanup; > + } At this point it is very unlikely for "dir" to be NULL, no? Even if the .string member after splitting were NULL, adding 2 to it would not leave it NULL. Being defensive and checking for NULL is good, but then exactly the same question on "NULL vs an empty string" applies here. > struct ref_store *get_main_ref_store(struct repository *r) > { > + char *ref_uri; > + > if (r->refs_private) > return r->refs_private; > > if (!r->gitdir) > BUG("attempting to get main_ref_store outside of repository"); > > - r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format); > + ref_uri = getenv(GIT_REF_URI_ENVIRONMENT); > + if (ref_uri) { > + r->refs_private = get_ref_store_from_uri(r, ref_uri); > + if (!r->refs_private) > + die("failed to initialize ref store from URI: %s", ref_uri); > + > + } else { > + r->refs_private = get_ref_store_for_dir(r, r->gitdir, > + r->ref_storage_format); > + } > return r->refs_private; > } If this mechanism is for consumption by "git refs migrate", is it possible to reduce the blast radius by giving the command a command line option to do an equivalent of this? I really am not happy with this environment variable that can change the behaviour of such a low level layer from unsuspecting programs that are not ready. Instead of tweaking the behaviour of this function via environment that can affect any programs, can't we give these callers like "git refs migrate" with specific needs set_main_ref_store() function that takes a ref_store and a repository. Then they can use to call into get_ref_store_for_dir() to obtain a ref they need. "git refs migrate" already takes "--ref-format" variable, so all it needs is another "--ref-directory" command line option, right? If the ability to set the ref backend location for arbitrary program proves to be useful, we _could_ give the same --ref-format and --ref-direcctory command line options to "git" itself (like "git -C there" runs any subcommand in the named directory), which does the the get_ref_store_for_dir() plus set_main_ref_store() dance, modelled after how "git refs migrate" does them. Hmm? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory 2025-11-26 16:17 ` Junio C Hamano @ 2025-11-27 14:52 ` Karthik Nayak 2025-11-27 20:02 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Karthik Nayak @ 2025-11-27 14:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila [-- Attachment #1: Type: text/plain, Size: 7511 bytes --] Junio C Hamano <gitster@pobox•com> writes: > Karthik Nayak <karthik.188@gmail•com> writes: > >> +`GIT_REF_URI`:: >> + Specify which reference backend to be used along with its URI. Reference >> + backends like the files, reftable backend use the $GIT_DIR as their URI. >> ++ >> +Expects the format `<ref_backend>://<URI-for-resource>`, where the >> +_<ref_backend>_ specifies the reference backend and the _<URI-for-resource>_ >> +specifies the URI used by the backend. > > It is more like "<directory>" that specifies the local directory the > backend is told to use to store its data. It feels way too broad > for what the initial implementation achieves and what the design can > potentially include, to say "URI-for-resource", I would think. > Well I'm okay either ways, my first version was very specific as it mention '<path>'. I changed it based on the discussion with you and Toon about how the '<path>' is the URI for the reference backend. >> diff --git a/environment.h b/environment.h >> index 51898c99cd..9bc380bba4 100644 >> --- a/environment.h >> +++ b/environment.h >> @@ -42,6 +42,7 @@ >> #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS" >> #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR" >> #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE" >> +#define GIT_REF_URI_ENVIRONMENT "GIT_REF_URI" >> >> /* >> * Environment variable used to propagate the --no-advice global option to the >> diff --git a/refs.c b/refs.c >> index 23f46867f2..a7af228799 100644 >> --- a/refs.c >> +++ b/refs.c >> @@ -2186,15 +2186,73 @@ static struct ref_store *get_ref_store_for_dir(struct repository *r, >> return maybe_debug_wrap_ref_store(dir, ref_store); >> } >> >> +static struct ref_store *get_ref_store_from_uri(struct repository *repo, >> + const char *uri) >> +{ >> + struct string_list ref_backend_info = STRING_LIST_INIT_DUP; >> + enum ref_storage_format format; >> + struct ref_store *store = NULL; >> + char *format_string; >> + char *dir; >> + >> + if (!uri || !uri[0]) { >> + error("reference backend uri is empty"); >> + goto cleanup; >> + } > > Equating !uri and !uri[0] and giving the same message would not help > diagnosing an error, and not _("localizing") the message is of dubious > value (after all, the message is not being given to somebody coming > over the network, but meant to be given to the local user, right?). > I think that's fair. I also missed localizing all the errors, I think someone did point that out too. > If we remove the !uri[0] from the check, shouldn't the later check > catch it as "invalid format" anyway, and print '%s' it to show that > what was given was empty clearly enough? > Yeah, it should I'll remove the latter and modify the test. >> + if (string_list_split(&ref_backend_info, uri, ":", 2) != 2) { >> + error("invalid reference backend uri format '%s'", uri); >> + goto cleanup; >> + } >> + >> + format_string = ref_backend_info.items[0].string; >> + if (!starts_with(ref_backend_info.items[1].string, "//")) { >> + error("invalid reference backend uri format '%s'", uri); >> + goto cleanup; >> + } >> + dir = ref_backend_info.items[1].string + 2; > > Two questions. (1) do we still want the double-slash after the > colon? (2) if so, would it make it simpler to string-list-split > using "://" as the separator? > (1) Yes. (2) My understanding of `string_list_split()` was that the `delim` argument are a set of characters to split the string on. So: string_list_split(l, "abc:def/ghi/jkl", "://", -1) -> ["abc", "def", "ghi", "jkl"] string_list_split(l, "reftable://foo", "://", -1) -> ["reftable", "", "", "foo", "bar"] But this isn't what we want. >> + format_string = ref_backend_info.items[0].string; >> + dir = ref_backend_info.items[1].string + 2; > > These two lines are fishy. Perhaps leftover from an earlier draft > that did not have an error checking before the previous 5 lines were > added? > Yes, will cleanup. >> + if (!dir || !dir[0]) { >> + error("invalid path in uri '%s'", uri); >> + goto cleanup; >> + } > > At this point it is very unlikely for "dir" to be NULL, no? Even if > the .string member after splitting were NULL, adding 2 to it would > not leave it NULL. > > Being defensive and checking for NULL is good, but then exactly the > same question on "NULL vs an empty string" applies here. > Yea, the '!dir[0]' should definitely be enough here. >> struct ref_store *get_main_ref_store(struct repository *r) >> { >> + char *ref_uri; >> + >> if (r->refs_private) >> return r->refs_private; >> >> if (!r->gitdir) >> BUG("attempting to get main_ref_store outside of repository"); >> >> - r->refs_private = get_ref_store_for_dir(r, r->gitdir, r->ref_storage_format); >> + ref_uri = getenv(GIT_REF_URI_ENVIRONMENT); >> + if (ref_uri) { >> + r->refs_private = get_ref_store_from_uri(r, ref_uri); >> + if (!r->refs_private) >> + die("failed to initialize ref store from URI: %s", ref_uri); >> + >> + } else { >> + r->refs_private = get_ref_store_for_dir(r, r->gitdir, >> + r->ref_storage_format); >> + } >> return r->refs_private; >> } > > If this mechanism is for consumption by "git refs migrate", is it > possible to reduce the blast radius by giving the command a command > line option to do an equivalent of this? I really am not happy with > this environment variable that can change the behaviour of such a > low level layer from unsuspecting programs that are not ready. > But the mechanism isn't for 'git refs migrate', but rather we want to add/update references via 'git update-ref' into the dry-run folder created by the 'git refs migrate'. In the broader sense, we want to manipulate references within this dry-run folder as if it is the reference folder for the underlying repository. I get the comprehension behind the environment variable and am happy to work on something alternative if we can achieve something similar. The reason to pick the ENV variable was mostly because this isn't a regular user flag which we expect users to use. Also, this is very similar to the already existing GIT_OBJECT_DIRECTORY. > Instead of tweaking the behaviour of this function via environment > that can affect any programs, can't we give these callers like "git > refs migrate" with specific needs set_main_ref_store() function that > takes a ref_store and a repository. Then they can use to call into > get_ref_store_for_dir() to obtain a ref they need. "git refs migrate" > already takes "--ref-format" variable, so all it needs is another > "--ref-directory" command line option, right? > Something like this would require us to add these flags to all commands, currently I can think of 'git update-ref' and 'git refs' but it could spread to all reference oriented commands. > If the ability to set the ref backend location for arbitrary program > proves to be useful, we _could_ give the same --ref-format and > --ref-direcctory command line options to "git" itself (like "git -C > there" runs any subcommand in the named directory), which does the > the get_ref_store_for_dir() plus set_main_ref_store() dance, > modelled after how "git refs migrate" does them. > > Hmm? This could work indeed, I would instead swap it out for a single "--ref-uri=<backend>://<uri>" which would make it much simpler for users and future implementations which might not have a 'directory' like the current backends do. Overall the ENV variable seemed the best based on the constraints and the existing similar variables. Wdyt? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory 2025-11-27 14:52 ` Karthik Nayak @ 2025-11-27 20:02 ` Junio C Hamano 2025-11-27 21:45 ` Karthik Nayak 0 siblings, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2025-11-27 20:02 UTC (permalink / raw) To: Karthik Nayak; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila Karthik Nayak <karthik.188@gmail•com> writes: > (2) My understanding of `string_list_split()` was that the `delim` > argument are a set of characters to split the string on. Ah, silly me. > But the mechanism isn't for 'git refs migrate', but rather we want to > add/update references via 'git update-ref' into the dry-run folder > created by the 'git refs migrate'. In the broader sense, we want to > manipulate references within this dry-run folder as if it is the > reference folder for the underlying repository. OK, I took the cover letter description too literally, it seems. If we want everybody in a single session to have a temporarily distorted view of the world, it has been a tried and proven way to use environment variables that override the default repository layout, e.g., GIT_DIR, GIT_WORK_TREE, and this "no reference interactions go there, not the usual place the repository configuration says" environment variable fits very well in the context. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory 2025-11-27 20:02 ` Junio C Hamano @ 2025-11-27 21:45 ` Karthik Nayak 0 siblings, 0 replies; 6+ messages in thread From: Karthik Nayak @ 2025-11-27 21:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, jltobler, toon, sunshine, Jean-Noël Avila [-- Attachment #1: Type: text/plain, Size: 1260 bytes --] Junio C Hamano <gitster@pobox•com> writes: > Karthik Nayak <karthik.188@gmail•com> writes: > >> (2) My understanding of `string_list_split()` was that the `delim` >> argument are a set of characters to split the string on. > > Ah, silly me. > >> But the mechanism isn't for 'git refs migrate', but rather we want to >> add/update references via 'git update-ref' into the dry-run folder >> created by the 'git refs migrate'. In the broader sense, we want to >> manipulate references within this dry-run folder as if it is the >> reference folder for the underlying repository. > > OK, I took the cover letter description too literally, it seems. > I did change the cover letter for this version with the plan of how this would be used. Let me know if you think I could clarify further. > If we want everybody in a single session to have a temporarily > distorted view of the world, it has been a tried and proven way to > use environment variables that override the default repository > layout, e.g., GIT_DIR, GIT_WORK_TREE, and this "no reference > interactions go there, not the usual place the repository > configuration says" environment variable fits very well in the > context. > > Thanks. Yes! Exactly. Good to see we're on the same page :) Karthik [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 690 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-28 5:24 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-28 5:21 [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Natee Korn -- strict thread matches above, loose matches on Subject: below -- 2025-11-26 11:11 [PATCH v2 0/2] refs: allow setting the reference directory Karthik Nayak 2025-11-26 11:12 ` [PATCH v2 2/2] refs: add GIT_REF_URI to specify reference backend and directory Karthik Nayak 2025-11-26 16:17 ` Junio C Hamano 2025-11-27 14:52 ` Karthik Nayak 2025-11-27 20:02 ` Junio C Hamano 2025-11-27 21:45 ` Karthik Nayak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox