From: Phillip Wood <phillip.wood123@gmail•com>
To: "Esteban Küber" <esteban@kuber•com.ar>,
"D. Ben Knoble" <ben.knoble@gmail•com>
Cc: git@vger•kernel.org
Subject: Re: Metadata for merge conflicts during rebase (to aid rustc) and potential for better user experience?
Date: Tue, 6 Jan 2026 14:29:58 +0000 [thread overview]
Message-ID: <2908fbe7-73bb-4f45-8d69-c2c685a9c3a2@gmail.com> (raw)
In-Reply-To: <CAHnEOG29C1fRBZtpEkebat8znMst7D1JiWdqDAVJQceYqMZGkA@mail.gmail.com>
Hi Esteban
On 24/12/2025 15:03, Esteban Küber wrote:
> On Mon, Dec 22, 2025 at 1:56 PM D. Ben Knoble <ben.knoble@gmail•com> wrote:
>> On Mon, Dec 22, 2025 at 9:31 AM Esteban Küber <esteban@kuber•com.ar> wrote:
>>>
>>> The questions I have are:
>>> - can I *avoid* `--points-at` in any way to identify what branch we're
>>> rebasing onto?
>>
>> According to "git help rebase", ORIG_HEAD is not reliable but @{1} should be.
>
> After talking with other members of the compiler team, people have
> concerns about invoking git from the compiler, as it can be a vector
> for unwanted behavior.
If we're talking about "git rev-parse --git-path" then that does not run
any hooks or external processes. In a linked worktree or submodule then
".git" is a file rather than a directory. You will need to read the file
(which looks like "gitdir: <path>\n" to find the path to the directory.
> I would agree with that assessment, so I am
> trying to settle on a mechanism where I can parse git state myself
> (on a best-effort basis; this is only for diagnostics, so fully
> featured support for all environments is not necessary).
>
>>> - is there already a better way to identify if the rebase was triggered by
>>> `git rebase` or `git pull` (configured to rebase)?
>>
>> I haven't studied the internals on this yet, but I think the common
>> pattern is to look at REBASE_HEAD vs. MERGE_HEAD.
>
> Thank you for the additional information! That prompted me to look
> into the rest of the files once more, which gave me some hacky ideas
> on how to get the data I want, and this indeed seems to be
> sufficient to differentiate these two.
>
>>> - if neither of the above has a "yes" answer, would git consider *adding*
>>> that information, both for third-parties as well as to extend its own UI?
>>
>> I think "git status" already shows some of this (maybe not the
>> branches in question, but certainly the "it looks like you're in the
>> middle of a rebase/merge/cherry-pick/etc.").
>
> I looked around again and arrived to the following conclusions:
>
> - presence of .git/rebase-merge (and its files) is enough to
> differentiate between a rebase and a merge
Being pedantic the presence of ".git/rebase-merge" tells us that a
rebase is in progress, it does not guarantee that the conflicts were
created by the rebase though as it is possible for the user to run "git
merge", "git cherry-pick" or "git revert" during a rebase. When a commit
is being split it is possible that the conflicts come from "git stash
pop" if the user stashes some changes, edits a file, commits and then
pops the stashed changes.
> - .git/rebase-merge/head-name is enough to identify one of the sections
Yes, that will give you the name of the branch being rebased.
> - identifying *at least* one of the sections is enough to make the
> output clear enough (even if ideally you'd identify both)
> - the sha in FETCH_HEAD matching .git/rebase-merge/onto is enough
> to identify that we're dealing with a `git rebase --rebase`
Note that FETCH_HEAD stays around until it is overwritten by the next
fetch so that if I run
git pull --rebase
followed by one of
git rebase --autosquash [--keep-base]
git rebase -i [--keep-base]
without running "git fetch" then ".git/rebase-merge/onto" will match
FETCH_HEAD but I'm not running "git pull" and I'm not rebasing onto a
new base so any conflicts come from re-arranging the existing commits,
not from changes in the upstream branch.
I think the most sensible way of solving this is for "git rebase" to
start writing a description of the "onto" commit to
".git/rebase-merge/onto-desc". That would allow the output of "git
status" to include the branch or tag that we're rebasing onto as well.
I've got a rough patch that creates that file in common cases. If the
base of the branch is not being changed the file contains "same base"
[1], if "onto" matches the upstream branch it contains "upstream <ref>"
where <ref> is the full ref of the upstream branch. If the argument
given to "--onto" is a ref then the file contains the full name of the
ref [2]. Finally when rebasing onto a new root commit it contains "new
root".
[1] Detecting that in the general case involves a revision walk which
I'd like to avoid so it only works in common cases like
git rebase -i HEAD~<n>
git rebase --keep-base --autostash
git rebase -i --onto ...@{u}
[2] If "--onto" is omitted then it defaults to "<upstream>" so if the
user runs "git rebase some-branch" the file will contain
"refs/heads/some-branch". Unfortunately "git pull --rebase" passes
object id's rather than refnames when it run "git rebase" so the
branch name is only detected when rebasing onto the upstream branch.
I'll try and post a patch next week.
> - there's information that is only present in MERGE_MSG in
> free-form text, that isn't present anywhere else
I assume that's the name of the branch we're merging into HEAD. For
squash merges the equivalent file is SQUASH_MSG.
> - I can extract the "missing" information for either the
> identifying information of where we are merging, be it because of
> a `git pull --no-rebase` or `git merge`; the only issue I see is
> in having to rely that the output will not change from either of
> "Merge branch 'main' into branch-name" and
> "Merge branch 'main' of example.url:user/repo" (how much trouble
> am I inviting if I were to try and rely on this text not changing
> so that I can get 'main' and the remote url from here?)
I'd be surprised if the messages changed but I don't think anyone is
going to pledge that they'll never change. You read the object id out of
MERGE_HEAD (that is always a file even if the repository is using the
reftable backend) and use "git for-each-ref --points-at" to find the
branch name.
> First, the information present in MERGE_MSG should be available in a
> more structured format, to allow for tools to deal with git state in
> a less coupled way. (This might not be worth it, and the textual
> representation is already "stable enough" to rely on.)
That might be useful for "git status" as we could say which branch was
being merged.
> Secondly, and perhaps more importantly, when generating the diff
> markers that end up in the user files, their description includes
> only the full sha or HEAD, or the short-sha and the commit message.
> I would propose that the branch be identified as well in the
> generated code. This could look something like:
>
> `git rebase`:
> <<<<<<< HEAD [branch 'main']
In the general case HEAD isn't really the branch 'main', it is main plus
whatever commits we've already applied. I think I saw someone suggest
[from 'main'] which might be better
> =======
>>>>>>>> e644375 (commit message) [branch 'name']
Unless we're applying the last commit from the branch this isn't branch
'name' but one of the commit from it.
>
> `git merge`:
> <<<<<<< HEAD [branch 'name']
> =======
> ------- between this marker and `>>>>>>>` is the code from branch 'master'
I'm skeptical that we want to inject extra text into the conflicted
region. It makes sense for rustc's diagnostics but it makes it harder to
resolve the conflict if we inject them into the file.
> println!("Hello, main!");
>>>>>>>> [branch 'main']
For merges [branch '<name>'] definitely makes sense for the two merge
heads, I'm not sure what we'd do for the merge base though.
> `git pull --rebase`:
> <<<<<<< HEAD [local branch 'main']
Do we really need a different label when pulling?
> =======
>>>>>>>> 8191e7e4f9f82be45bdd4e71c37d2adcf4f88aa2 [branch 'main' of example.tld:user/repo]
Ideally we'd use the remote tracking branch here when pulling from a
configured remote repository rather than giving the name of the branch
on the remote and it's url.
> `git pull --no-rebase`:
> <<<<<<< HEAD [branch 'main' of example.tld:user/repo]
> =======
>>>>>>>> ebbeec7 (commit message) [local branch 'main']
>
> The format doesn't have to match the above exactly, but having the
> commit *and branch* information will make it much easier for people
> to identify things at a glance, at the cost of some additional
> verbosity in the generated code.
>
> The source of the issue is that where "our" and "their" code is in
> the patch depends on a somewhat "arbitrary" distinction (as far as
> a non-implementer is concerned) and it *swaps places* depending on
> whether we are rebasing or merging. Adding some context to the
> resulting patches would go a long way of mitigating the confusion
> this causes.
I agree having some indication of which branch each side comes would be
useful but I think when rebasing it needs to be clear that the branch
does not necessarily point to that particular commit.
Thanks
Phillip
> Happy holidays,
> Esteban Küber
>
prev parent reply other threads:[~2026-01-06 14:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-22 14:31 Metadata for merge conflicts during rebase (to aid rustc) and potential for better user experience? Esteban Küber
2025-12-22 21:56 ` D. Ben Knoble
2025-12-24 15:03 ` Esteban Küber
2025-12-27 18:22 ` Ben Knoble
2026-01-06 14:29 ` Phillip Wood [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2908fbe7-73bb-4f45-8d69-c2c685a9c3a2@gmail.com \
--to=phillip.wood123@gmail$(echo .)com \
--cc=ben.knoble@gmail$(echo .)com \
--cc=esteban@kuber$(echo .)com.ar \
--cc=git@vger$(echo .)kernel.org \
--cc=phillip.wood@dunelm$(echo .)org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox