From: Emily Shaffer <emilyshaffer@google•com>
To: git@vger•kernel.org
Cc: Emily Shaffer <emilyshaffer@google•com>,
Calvin Wan <calvinwan@google•com>,
Han Young <hanyang.tony@bytedance•com>,
Jonathan Tan <jonathantanmy@google•com>,
sokcevic@google•com
Subject: [RFC PATCH] promisor-remote: always JIT fetch with --refetch
Date: Thu, 3 Oct 2024 15:35:46 -0700 [thread overview]
Message-ID: <20241003223546.1935471-1-emilyshaffer@google.com> (raw)
By the time we decide we need to do a partial clone fetch, we already
know the object is missing, even if the_repository->parsed_objects
thinks it exists. But --refetch bypasses the local object check, so we
can guarantee that a JIT fetch will fix incorrect local caching.
This manifested at $DAYJOB in a repo with the following features:
* blob-filtered partial clone enabled
* commit graph enabled
* ref Foo pointing to commit object 6aaaca
* object 6aaaca missing[a]
With these prerequisites, we noticed that `git fetch` in the repo
produced an infinite loop:
1. `git fetch` tries to fetch, but thinks it has all objects, so it
noops.
2. At the end of cmd_fetch(), we try to write_commit_graph_reachable().
3. write_commit_graph_reachable() does a reachability walk, including
starting from Foo
4. The reachability walk tries to peel Foo, and notices it's missing
6aaaca.
5. The partial clone machinery asks for a per-object JIT fetch of
6aaaca.
6. `git fetch` (child process) is asked to fetch 6aaaca.
7. We put together the_repository->parsed_objects, adding all commit IDs
reachable from local refs to it
(fetch-pack.c:mark_complete_and_common_refs(), trace region
mark_complete_local_refs). We see Foo, so we add 6aaaca to
the_repository->parsed_objects and mark it as COMPLETE.
8. cmd_fetch notices that the object ID it was asked for is already
known, so does not fetch anything new.
9. GOTO 2.
The culprit is that we're assuming all local refs already must have
objects in place. Using --refetch means we ignore that assumption during
JIT fetch.
Signed-off-by: Emily Shaffer <emilyshaffer@google•com>
---
There are a few alternative approaches for this issue that I talked
about with some folks at $DAYJOB:
i. Just disabling the commit graph rewrite allows this to fall
eventually into a path where the fetch actually succeeds. I didn't like
this solution - it's just whack-a-mole - so I didn't look too hard into
why it succeeds that way. It *could* make sense to disable commit graph
rewrite when we do a JIT fetch with blob or tree filter provided - but
if later we want to implement commit filter (something we've talked
about at Google) then I'd worry about this situation coming up again.
ii. We could decide not to mark local refs (and commits reachable from
them) as COMPLETE in the_repository->parsed_objects. I didn't try this
solution out, and I'm not sure what the performance implications are,
but Jonathan Tan likes this solution, so I may try it out and see what
breaks shortly.
iii. We could do all the JIT fetches with --refetch. In my opinion, this
is the safest/most self-healing solution; the JIT fetch only happens
when we really know we're missing the object, so it doesn't make sense
for that fetch to be canceled by any cache. It doesn't have performance
implications as far as I can guess (except that I think we still build
the parsed_objects hash even though we are going to ignore it, but we
already were doing that anyway). Of course, that's what this patch does.
iv. We could do nothing; when cmd_fetch gets a fetch-by-object-id but
decides there is nothing more to do, it could terminate with an error.
That should stop the infinite recursion, and the error could suggest the
user to run `git fsck` and discover what the problem is. Depending on
the remediation we suggest, though, I think a direct fetch to fix this
particular loop would not work.
I'm curious to hear thoughts from people who are more expert than me on
partial clone and fetching in general, though.
This change is also still in RFC, for two reasons:
First, it's intermittently failing tests for me locally, in weirdly
flaky ways:
- t0410-partial-clone.sh fails when I run it from prove, but passes when
I run it manually, every time.
- t5601-clone.sh and t5505-remote.sh fail nonsensically on `rm -rf` that
should succeed (and does succeed if I stop the test with test_pause),
which makes me think there's something else borked in my setup, but
I'm not sure what.
- t5616-partial-clone.sh actually does fail in a way that I could see
having to do with this change (since I guess we might download more
packs than usual), but I was so confused by the other two errors I
haven't looked closely yet.
And secondly, I didn't write tests verifying the breakage and that this
change fixes it yet, either.
I'm going to work on both those things in the background, but I wanted
to get the description and RFC out early so that folks could take a look
and we could decide which approach is best.
Thanks,
- Emily
a: That commit object went missing as a byproduct of this partial clone
gc issue that Calvin, Jonathan, Han Young, and others have been
investigating:
https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/
---
promisor-remote.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/promisor-remote.c b/promisor-remote.c
index 9345ae3db2..cf00e31d3b 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -43,7 +43,7 @@ static int fetch_objects(struct repository *repo,
strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
"fetch", remote_name, "--no-tags",
"--no-write-fetch-head", "--recurse-submodules=no",
- "--filter=blob:none", "--stdin", NULL);
+ "--filter=blob:none", "--refetch", "--stdin", NULL);
if (!git_config_get_bool("promisor.quiet", &quiet) && quiet)
strvec_push(&child.args, "--quiet");
if (start_command(&child))
--
2.47.0.rc0.187.ge670bccf7e-goog
next reply other threads:[~2024-10-03 22:35 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-03 22:35 Emily Shaffer [this message]
2024-10-06 22:43 ` [RFC PATCH] promisor-remote: always JIT fetch with --refetch Junio C Hamano
2024-10-07 0:21 ` Robert Coup
2024-10-07 0:37 ` Junio C Hamano
2024-10-11 16:40 ` Emily Shaffer
2024-10-11 17:54 ` Junio C Hamano
2024-10-23 0:28 ` [PATCH v2] fetch-pack: don't mark COMPLETE unless we have the full object Emily Shaffer
2024-10-23 18:53 ` Emily Shaffer
2024-10-23 20:11 ` Taylor Blau
2024-10-28 22:55 ` Jonathan Tan
2024-10-29 21:11 ` [PATCH 0/2] When fetching, warn if in commit graph but not obj db Jonathan Tan
2024-10-29 21:11 ` [PATCH 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-30 21:22 ` Josh Steadmon
2024-10-29 21:11 ` [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-10-30 21:22 ` Josh Steadmon
2024-10-31 21:23 ` Jonathan Tan
2024-10-31 20:59 ` Taylor Blau
2024-10-31 21:43 ` Jonathan Tan
2024-11-01 14:33 ` Taylor Blau
2024-11-01 17:33 ` Jonathan Tan
2024-10-30 21:22 ` [PATCH 0/2] When fetching, " Josh Steadmon
2024-10-31 21:18 ` [PATCH v2 0/2] When fetching, die " Jonathan Tan
2024-10-31 21:19 ` [PATCH v2 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-10-31 21:19 ` [PATCH v2 2/2] fetch-pack: warn if in commit graph but not obj db Jonathan Tan
2024-11-01 2:22 ` Junio C Hamano
2024-11-01 4:25 ` Junio C Hamano
2024-11-01 8:59 ` [External] " Han Xin
2024-11-01 17:46 ` Jonathan Tan
2024-11-01 17:40 ` Jonathan Tan
2024-11-02 2:08 ` Junio C Hamano
2024-11-01 17:36 ` Jonathan Tan
2024-11-01 15:18 ` Taylor Blau
2024-11-01 17:49 ` Jonathan Tan
2024-10-31 22:33 ` [PATCH v2 0/2] When fetching, die " Josh Steadmon
2024-11-05 19:24 ` [PATCH v3 " Jonathan Tan
2024-11-05 19:24 ` [PATCH v3 1/2] Revert "fetch-pack: add a deref_without_lazy_fetch_extended()" Jonathan Tan
2024-11-05 19:24 ` [PATCH v3 2/2] fetch-pack: die if in commit graph but not obj db Jonathan Tan
2024-11-06 3:12 ` [PATCH v3 0/2] When fetching, " Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241003223546.1935471-1-emilyshaffer@google.com \
--to=emilyshaffer@google$(echo .)com \
--cc=calvinwan@google$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=hanyang.tony@bytedance$(echo .)com \
--cc=jonathantanmy@google$(echo .)com \
--cc=sokcevic@google$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox