public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail•com>
To: "Esteban Küber" <esteban@kuber•com.ar>,
	"D. Ben Knoble" <ben.knoble@gmail•com>
Cc: git@vger•kernel.org
Subject: Re: Metadata for merge conflicts during rebase (to aid rustc) and potential for better user experience?
Date: Tue, 6 Jan 2026 14:29:58 +0000	[thread overview]
Message-ID: <2908fbe7-73bb-4f45-8d69-c2c685a9c3a2@gmail.com> (raw)
In-Reply-To: <CAHnEOG29C1fRBZtpEkebat8znMst7D1JiWdqDAVJQceYqMZGkA@mail.gmail.com>

Hi Esteban

On 24/12/2025 15:03, Esteban Küber wrote:
> On Mon, Dec 22, 2025 at 1:56 PM D. Ben Knoble <ben.knoble@gmail•com> wrote:
>> On Mon, Dec 22, 2025 at 9:31 AM Esteban Küber <esteban@kuber•com.ar> wrote:
>>>
>>> The questions I have are:
>>>   - can I *avoid* `--points-at` in any way to identify what branch we're
>>>     rebasing onto?
>>
>> According to "git help rebase", ORIG_HEAD is not reliable but @{1} should be.
> 
> After talking with other members of the compiler team, people have
> concerns about invoking git from the compiler, as it can be a vector
> for unwanted behavior.

If we're talking about "git rev-parse --git-path" then that does not run 
any hooks or external processes. In a linked worktree or submodule then 
".git" is a file rather than a directory. You will need to read the file 
(which looks like "gitdir: <path>\n" to find the path to the directory.

> I would agree with that assessment, so I am
> trying to settle on a mechanism where I can parse git state myself
> (on a best-effort basis; this is only for diagnostics, so fully
> featured support for all environments is not necessary).
> 
>>>   - is there already a better way to identify if the rebase was triggered by
>>>     `git rebase` or `git pull` (configured to rebase)?
>>
>> I haven't studied the internals on this yet, but I think the common
>> pattern is to look at REBASE_HEAD vs. MERGE_HEAD.
> 
> Thank you for the additional information! That prompted me to look
> into the rest of the files once more, which gave me some hacky ideas
> on how to get the data I want, and this indeed seems to be
> sufficient to differentiate these two.
> 
>>>   - if neither of the above has a "yes" answer, would git consider *adding*
>>>     that information, both for third-parties as well as to extend its own UI?
>>
>> I think "git status" already shows some of this (maybe not the
>> branches in question, but certainly the "it looks like you're in the
>> middle of a rebase/merge/cherry-pick/etc.").
> 
> I looked around again and arrived to the following conclusions:
> 
>   - presence of .git/rebase-merge (and its files) is enough to
>     differentiate between a rebase and a merge

Being pedantic the presence of ".git/rebase-merge" tells us that a 
rebase is in progress, it does not guarantee that the conflicts were 
created by the rebase though as it is possible for the user to run "git 
merge", "git cherry-pick" or "git revert" during a rebase. When a commit 
is being split it is possible that the conflicts come from "git stash 
pop" if the user stashes some changes, edits a file, commits and then 
pops the stashed changes.

>   - .git/rebase-merge/head-name is enough to identify one of the sections

Yes, that will give you the name of the branch being rebased.

>   - identifying *at least* one of the sections is enough to make the
>     output clear enough (even if ideally you'd identify both)
>   - the sha in FETCH_HEAD matching .git/rebase-merge/onto is enough
>     to identify that we're dealing with a `git rebase --rebase`

Note that FETCH_HEAD stays around until it is overwritten by the next 
fetch so that if I run

	git pull --rebase

followed by one of

	git rebase --autosquash [--keep-base]
	git rebase -i [--keep-base]

without running "git fetch" then ".git/rebase-merge/onto" will match 
FETCH_HEAD but I'm not running "git pull" and I'm not rebasing onto a 
new base so any conflicts come from re-arranging the existing commits, 
not from changes in the upstream branch.

I think the most sensible way of solving this is for "git rebase" to 
start writing a description of the "onto" commit to 
".git/rebase-merge/onto-desc". That would allow the output of "git 
status" to include the branch or tag that we're rebasing onto as well. 
I've got a rough patch that creates that file in common cases. If the 
base of the branch is not being changed the file contains "same base" 
[1], if "onto" matches the upstream branch it contains "upstream <ref>" 
where <ref> is the full ref of the upstream branch. If the argument 
given to "--onto" is a ref then the file contains the full name of the 
ref [2]. Finally when rebasing onto a new root commit it contains "new 
root".

[1] Detecting that in the general case involves a revision walk which
     I'd like to avoid so it only works in common cases like
         git rebase -i HEAD~<n>
         git rebase --keep-base --autostash
         git rebase -i --onto ...@{u}

[2] If "--onto" is omitted then it defaults to "<upstream>" so if the
     user runs "git rebase some-branch" the file will contain
     "refs/heads/some-branch". Unfortunately "git pull --rebase" passes
     object id's rather than refnames when it run "git rebase" so the
     branch name is only detected when rebasing onto the upstream branch.


I'll try and post a patch next week.

>   - there's information that is only present in MERGE_MSG in
>     free-form text, that isn't present anywhere else

I assume that's the name of the branch we're merging into HEAD. For 
squash merges the equivalent file is SQUASH_MSG.

>   - I can extract the "missing" information for either the
>     identifying information of where we are merging, be it because of
>     a `git pull --no-rebase` or `git merge`; the only issue I see is
>     in having to rely that the output will not change from either of
>     "Merge branch 'main' into branch-name" and
>     "Merge branch 'main' of example.url:user/repo" (how much trouble
>     am I inviting if I were to try and rely on this text not changing
>     so that I can get 'main' and the remote url from here?)

I'd be surprised if the messages changed but I don't think anyone is 
going to pledge that they'll never change. You read the object id out of 
MERGE_HEAD (that is always a file even if the repository is using the 
reftable backend) and use "git for-each-ref --points-at" to find the 
branch name.

> First, the information present in MERGE_MSG should be available in a
> more structured format, to allow for tools to deal with git state in
> a less coupled way. (This might not be worth it, and the textual
> representation is already "stable enough" to rely on.)

That might be useful for "git status" as we could say which branch was 
being merged.

> Secondly, and perhaps more importantly, when generating the diff
> markers that end up in the user files, their description includes
> only the full sha or HEAD, or the short-sha and the commit message.
> I would propose that the branch be identified as well in the
> generated code.  This could look something like:
> 
> `git rebase`:
> <<<<<<< HEAD [branch 'main']

In the general case HEAD isn't really the branch 'main', it is main plus 
whatever commits we've already applied. I think I saw someone suggest 
[from 'main'] which might be better

> =======
>>>>>>>> e644375 (commit message) [branch 'name']

Unless we're applying the last commit from the branch this isn't branch 
'name' but one of the commit from it.

> 
> `git merge`:
> <<<<<<< HEAD [branch 'name']
> =======
> ------- between this marker and `>>>>>>>` is the code from branch 'master'

I'm skeptical that we want to inject extra text into the conflicted 
region. It makes sense for rustc's diagnostics but it makes it harder to 
resolve the conflict if we inject them into the file.

>      println!("Hello, main!");
>>>>>>>> [branch 'main']

For merges [branch '<name>'] definitely makes sense for the two merge 
heads, I'm not sure what we'd do for the merge base though.

> `git pull --rebase`:
> <<<<<<< HEAD [local branch 'main']

Do we really need a different label when pulling?

> =======
>>>>>>>> 8191e7e4f9f82be45bdd4e71c37d2adcf4f88aa2 [branch 'main' of example.tld:user/repo]

Ideally we'd use the remote tracking branch here when pulling from a 
configured remote repository rather than giving the name of the branch 
on the remote and it's url.

> `git pull --no-rebase`:
> <<<<<<< HEAD [branch 'main' of example.tld:user/repo]
> =======
>>>>>>>> ebbeec7 (commit message) [local branch 'main']
> 
> The format doesn't have to match the above exactly, but having the
> commit *and branch* information will make it much easier for people
> to identify things at a glance, at the cost of some additional
> verbosity in the generated code.
> 
> The source of the issue is that where "our" and "their" code is in
> the patch depends on a somewhat "arbitrary" distinction (as far as
> a non-implementer is concerned) and it *swaps places* depending on
> whether we are rebasing or merging. Adding some context to the
> resulting patches would go a long way of mitigating the confusion
> this causes.

I agree having some indication of which branch each side comes would be 
useful but I think when rebasing it needs to be clear that the branch 
does not necessarily point to that particular commit.

Thanks

Phillip

> Happy holidays,
> Esteban Küber
> 


      parent reply	other threads:[~2026-01-06 14:30 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-22 14:31 Metadata for merge conflicts during rebase (to aid rustc) and potential for better user experience? Esteban Küber
2025-12-22 21:56 ` D. Ben Knoble
2025-12-24 15:03   ` Esteban Küber
2025-12-27 18:22     ` Ben Knoble
2026-01-06 14:29     ` Phillip Wood [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2908fbe7-73bb-4f45-8d69-c2c685a9c3a2@gmail.com \
    --to=phillip.wood123@gmail$(echo .)com \
    --cc=ben.knoble@gmail$(echo .)com \
    --cc=esteban@kuber$(echo .)com.ar \
    --cc=git@vger$(echo .)kernel.org \
    --cc=phillip.wood@dunelm$(echo .)org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox