public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Michael J Gruber <git@drmicha•warpmail.net>
To: Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org, Jeff King <peff@peff•net>
Subject: Re: [PATCH] combine-diff: use textconv for combined diff format
Date: Sat, 16 Apr 2011 12:24:08 +0200	[thread overview]
Message-ID: <4DA96E48.3050008@drmicha.warpmail.net> (raw)
In-Reply-To: <7voc47cqj0.fsf@alter.siamese.dyndns.org>

Junio C Hamano venit, vidit, dixit 15.04.2011 20:34:
> Michael J Gruber <git@drmicha•warpmail.net> writes:
> 
>> git diff -m produces a combined diff!
> 
> Hmm, what is the rest of your command line?  I thought -m was a way to ask
> pairwise diff with each parent.

Sure, but it does not always work like that. Just look at the test from
my patch, or do any "git merge --no-commit" and then "git diff -m". I
would expect that to compare the worktree to each parent, but in fact it
runs "diff --cc".

At least I thought that's the only way how combine-diff would ever have
to deal with a merge result in the worktree as opposed to a blob. And it
seems that "diff -m" does not handle this but relays to "diff --cc" for
current git. I have not checked the "-m" codepath.

>> +static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr, int n, int num_parent, int textconv)
>>  {
>>  	struct diff_queue_struct *q = &diff_queued_diff;
>>  	struct combine_diff_path *p;
>> @@ -34,9 +34,13 @@ static struct combine_diff_path *intersect_paths(struct combine_diff_path *curr,
>>  
>>  			hashcpy(p->sha1, q->queue[i]->two->sha1);
>>  			p->mode = q->queue[i]->two->mode;
>> +			if (textconv)
>> +				p->textconv = get_textconv(q->queue[i]->two);
>>  			hashcpy(p->parent[n].sha1, q->queue[i]->one->sha1);
>>  			p->parent[n].mode = q->queue[i]->one->mode;
>>  			p->parent[n].status = q->queue[i]->status;
>> +			if (textconv)
>> +				p->parent[n].textconv = get_textconv(q->queue[i]->one);
> 
> This code attempts to handle different textconv set for each different
> parents.  But I have to wonder if that is really worth it.
> 
> The attribute to decide the content type of the blob is read from the same
> set of .gitattributes files, regardless of which parent you are looking at
> (and this is not likely to change---the exact procedure that is applied
> comes from .git/config that is not even versioned, so there is not much
> point in reading from the .gitattributes from the parent tree, trying to
> be "precise").
> 
> If q->queue[i] is not a rename, p->textconv and p->parent[n].textconv
> would be the same because one and two came from the same path.  If it is a
> rename, they by definition consist of similar contents, and the user would
> want the same textconv conversion applied to them to make them comparable.
> Even though using p->parent[n].textconv to convert q->queue[i]->one->sha1
> blob and using p->textconv to convert q->queue[i]->two->sha1 blob might be
> the right thing to do in theory, doing so wouldn't make a difference in
> practice.  More importantly, even if the two textconvs specify different
> conversions, it is likely that it is an user error (e.g. the preimage had
> "img4433.jqg" that was renamed to img4433.jpg" in the postimage, and the
> attributes mechanism does not say ".jqg" is a JPEG that wants to get
> "exif" run to be texualized for the purpose of diffing, or something).
> 
> Besides, if you really want to support "left hand side and right hand
> side, depending on which parent we are talking about, may use different
> textconv", you would need to defeat the optimization in show_patch_diff()
> that calls reuse_combine_diff() when sha1 are the same from other parent
> we have already compared---the parent we are looking at may be using a
> different textconv procedure.  Even worse, if parent and child have the
> same sha1, the result of running parent textconv on the parent blob may be
> different from that of the child, which you would never even see in this
> codepath.
> 
> So I suspect that using only one textconv per "struct combine_diff_path"
> would make both the code simpler, and more importantly would make the
> result more correct from the end user's point of view.

I'd be happy to take the simpler approach. While I still think the other
one is "more correct" (modulo the reuse issue) it should not matter in
most cases.

> 
>> @@ -777,6 +783,12 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
>>  			close(fd);
>>  	}
>>  
>> +	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) && elem->textconv) {
>> +		struct diff_filespec *df = alloc_filespec(elem->path);
>> +		fill_filespec(df, elem->sha1, elem->mode);
>> +		result_size = fill_textconv(elem->textconv, df, &result);
>> +	}
> 
> I suspect that these three lines have to become a small helper function to
> be used to convert the final blob (done here), and parent blob (done in
> combine_diff).  With the "binary" support, it would eventually need to be
> enhanced to something like:
> 
> 	if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV)) {
>         	if (textconv) {
>                 	do these three lines;
> 		} else if (is binary) {
>                 	"Binary blob $SHA-1";
> 		}
> 	}
> 

"diff -m --oneline" says something like

aa01ae1 (from 64c0923) Merge branch 'master' into somebranch
diff --git a/a b/a
index 72594ed..d8323da 100644
Binary files a/a and b/a differ
aa01ae1 (from e85049e) Merge branch 'master' into somebranch
diff --git a/a b/a
index 86e041d..d8323da 100644
Binary files a/a and b/a differ


so I'm wondering whether we shouldn't stay closer to that with "--cc
also", e.g.:

aa01ae1 Merge branch 'master' into somebranch
diff --cc a
index 72594ed,86e041d..d8323da
Binary files a/a and b/a differ

BTW: Currently, "--cc --oneline" produces an extra newline before the
diff line, and also note how the diff lines differ ("a/a b/a" vs. "a").
But those are different issues.

> and having a small helper function early in the series would help that
> process.
> 
>> +		paths = intersect_paths(paths, i, num_parent, DIFF_OPT_TST(opt, ALLOW_TEXTCONV));
> 
> As an internal API within this file, I would rather see "opt" as a whole
> passed to intersect_paths().  We may probably want to determine if the
> blob is binary in that function depending on other "opt" fields.

Yep.
Michael

[Resent today, sorry. Couldn't get myself to reboot that box yesterday
after a disk gave up.]

  reply	other threads:[~2011-04-16 10:24 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-11 17:12 textconv not invoked when viewing merge commit Peter Oberndorfer
2011-04-12  9:04 ` Michael J Gruber
2011-04-14 19:09   ` Jeff King
2011-04-14 19:15     ` Jeff King
2011-04-14 19:26     ` Junio C Hamano
2011-04-14 19:28       ` Jeff King
2011-04-14 19:35         ` Michael J Gruber
2011-04-14 19:43       ` Junio C Hamano
2011-04-14 20:06         ` Junio C Hamano
2011-04-14 20:23           ` Jeff King
2011-04-14 21:05             ` Junio C Hamano
2011-04-14 21:30               ` Jeff King
2011-04-15 15:29                 ` [PATCH] combine-diff: use textconv for combined diff format Michael J Gruber
2011-04-15 18:34                   ` Junio C Hamano
2011-04-16 10:24                     ` Michael J Gruber [this message]
2011-04-16 17:19                       ` Junio C Hamano
2011-04-16 21:37                         ` Jakub Narebski
2011-04-15 23:56                   ` Jeff King
2011-04-21 16:08                   ` Peter Oberndorfer
2011-04-15  6:54             ` textconv not invoked when viewing merge commit Matthieu Moy
2011-04-15 20:45               ` Junio C Hamano
2011-04-16  1:47                 ` Jeff King
2011-04-16  6:10                   ` Junio C Hamano
2011-04-16  6:33                     ` Jeff King
2011-04-16 16:23                       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4DA96E48.3050008@drmicha.warpmail.net \
    --to=git@drmicha$(echo .)warpmail.net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox