From: Michael J Gruber <git@drmicha•warpmail.net>
To: Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org, Jeff King <peff@peff•net>
Subject: Re: [PATCH] combine-diff: use textconv for combined diff format
Date: Sat, 16 Apr 2011 12:24:08 +0200 [thread overview]
Message-ID: <4DA96E48.3050008@drmicha.warpmail.net> (raw)
In-Reply-To: <7voc47cqj0.fsf@alter.siamese.dyndns.org>
Junio C Hamano venit, vidit, dixit 15.04.2011 20:34:
> Michael J Gruber <git@drmicha•warpmail.net> writes:
>
>> git diff -m produces a combined diff!
>
> Hmm, what is the rest of your command line? I thought -m was a way to ask
> pairwise diff with each parent.
Sure, but it does not always work like that. Just look at the test from
my patch, or do any "git merge --no-commit" and then "git diff -m". I
would expect that to compare the worktree to each parent, but in fact it
runs "diff --cc".
At least I thought that's the only way how combine-diff would ever have
to deal with a merge result in the worktree as opposed to a blob. And it
seems that "diff -m" does not handle this but relays to "diff --cc" for
current git. I have not checked the "-m" codepath.
>> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
>> {
>> struct diff_queue_struct *q = &diff_queued_diff;
>> struct combine_diff_path *p;
>> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
>>
>> hashcpy(p->sha1, q->queue[i]->two->sha1);
>> p->mode = q->queue[i]->two->mode;
>> + if (textconv)
>> + p->textconv = get_textconv(q->queue[i]->two);
>> hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
>> p->parent[n].mode = q->queue[i]->one->mode;
>> p->parent[n].status = q->queue[i]->status;
>> + if (textconv)
>> + p->parent[n].textconv = get_textconv(q->queue[i]->one);
>
> This code attempts to handle different textconv set for each different
> parents. But I have to wonder if that is really worth it.
>
> The attribute to decide the content type of the blob is read from the same
> set of .gitattributes files, regardless of which parent you are looking at
> (and this is not likely to change---the exact procedure that is applied
> comes from .git/config that is not even versioned, so there is not much
> point in reading from the .gitattributes from the parent tree, trying to
> be "precise").
>
> If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv
> would be the same because one and two came from the same path. If it is a
> rename, they by definition consist of similar contents, and the user would
> want the same textconv conversion applied to them to make them comparable.
> Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1
> blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be
> the right thing to do in theory, doing so wouldn't make a difference in
> practice. More importantly, even if the two textconvs specify different
> conversions, it is likely that it is an user error (e.g. the preimage had
> "img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the
> attributes mechanism does not say ".jqg" is a JPEG that wants to get
> "exif" run to be texualized for the purpose of diffing, or something).
>
> Besides, if you really want to support "left hand side and right hand
> side, depending on which parent we are talking about, may use different
> textconv", you would need to defeat the optimization in show_patch_diff()
> that calls reuse_combine_diff() when sha1 are the same from other parent
> we have already compared---the parent we are looking at may be using a
> different textconv procedure. Even worse, if parent and child have the
> same sha1, the result of running parent textconv on the parent blob may be
> different from that of the child, which you would never even see in this
> codepath.
>
> So I suspect that using only one textconv per "struct combine_diff_path"
> would make both the code simpler, and more importantly would make the
> result more correct from the end user's point of view.
I'd be happy to take the simpler approach. While I still think the other
one is "more correct" (modulo the reuse issue) it should not matter in
most cases.
>
>> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>> close(fd);
>> }
>>
>> + if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
>> + struct diff_filespec *df = alloc_filespec(elem->path);
>> + fill_filespec(df, elem->sha1, elem->mode);
>> + result_size = fill_textconv(elem->textconv, df, &result);
>> + }
>
> I suspect that these three lines have to become a small helper function to
> be used to convert the final blob (done here), and parent blob (done in
> combine_diff). With the "binary" support, it would eventually need to be
> enhanced to something like:
>
> if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) {
> if (textconv) {
> do these three lines;
> } else if (is binary) {
> "Binary blob $SHA-1";
> }
> }
>
"diff -m --oneline" says something like
aa01ae1 (from 64c0923) Merge branch 'master' into somebranch
diff --git a/a b/a
index 72594ed..d8323da 100644
Binary files a/a and b/a differ
aa01ae1 (from e85049e) Merge branch 'master' into somebranch
diff --git a/a b/a
index 86e041d..d8323da 100644
Binary files a/a and b/a differ
so I'm wondering whether we shouldn't stay closer to that with "--cc
also", e.g.:
aa01ae1 Merge branch 'master' into somebranch
diff --cc a
index 72594ed,86e041d..d8323da
Binary files a/a and b/a differ
BTW: Currently, "--cc --oneline" produces an extra newline before the
diff line, and also note how the diff lines differ ("a/a b/a" vs. "a").
But those are different issues.
> and having a small helper function early in the series would help that
> process.
>
>> + paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));
>
> As an internal API within this file, I would rather see "opt" as a whole
> passed to intersect_paths(). We may probably want to determine if the
> blob is binary in that function depending on other "opt" fields.
Yep.
Michael
[Resent today, sorry. Couldn't get myself to reboot that box yesterday
after a disk gave up.]
next prev parent reply other threads:[~2011-04-16 10:24 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-11 17:12 textconv not invoked when viewing merge commit Peter Oberndorfer
2011-04-12 9:04 ` Michael J Gruber
2011-04-14 19:09 ` Jeff King
2011-04-14 19:15 ` Jeff King
2011-04-14 19:26 ` Junio C Hamano
2011-04-14 19:28 ` Jeff King
2011-04-14 19:35 ` Michael J Gruber
2011-04-14 19:43 ` Junio C Hamano
2011-04-14 20:06 ` Junio C Hamano
2011-04-14 20:23 ` Jeff King
2011-04-14 21:05 ` Junio C Hamano
2011-04-14 21:30 ` Jeff King
2011-04-15 15:29 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
2011-04-15 18:34 ` Junio C Hamano
2011-04-16 10:24 ` Michael J Gruber [this message]
2011-04-16 17:19 ` Junio C Hamano
2011-04-16 21:37 ` Jakub Narebski
2011-04-15 23:56 ` Jeff King
2011-04-21 16:08 ` Peter Oberndorfer
2011-04-15 6:54 ` textconv not invoked when viewing merge commit Matthieu Moy
2011-04-15 20:45 ` Junio C Hamano
2011-04-16 1:47 ` Jeff King
2011-04-16 6:10 ` Junio C Hamano
2011-04-16 6:33 ` Jeff King
2011-04-16 16:23 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4DA96E48.3050008@drmicha.warpmail.net \
--to=git@drmicha$(echo .)warpmail.net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox