From: Junio C Hamano <gitster@pobox•com>
To: Elliot Wolk <elliot.wolk@gmail•com>
Cc: Robin Rosenberg <robin.rosenberg@dewire•com>, git@vger•kernel.org
Subject: Re: move detection doesnt take filename into account
Date: Tue, 01 Jul 2014 10:08:15 -0700 [thread overview]
Message-ID: <xmqq61jhxb0g.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <53B2CE4A.9060509@gmail.com> (Elliot Wolk's message of "Tue, 01 Jul 2014 11:05:46 -0400")
Elliot Wolk <elliot.wolk@gmail•com> writes:
> On 07/01/2014 10:57 AM, Junio C Hamano wrote:
>> Robin Rosenberg <robin.rosenberg@dewire•com> writes:
>>
>>> I think it does, but based on filename suffix. E.g. here is a rename of
>>> three empty files with a suffix.
>>>
>>> 3 files changed, 0 insertions(+), 0 deletions(-)
>>> rename 1.a => 2.a (100%)
>>> rename 1.b => 2.b (100%)
>>> rename 1.c => 2.c (100%)
>> This is not more than a chance.
>>
>> We tie-break rename source candidates that have the same content
>> similarity score to a rename destination using "name similarity",
>> whose implementation has been diffcore-rename.c::basename_same(),
>> which scores 1 if `basename $src` and `basename $dst` are the same
>> and 0 otherwise, i.e. from 1.a to a/1.a is judged to be a better
>> rename than from 1.a to a/2.a but otherwise there is nothing that
>> favors rename from 1.a to 2.a over 1.a to 2.b.
>
> thanks for the info!
> then i suppose my bug is a petition to have name similarity instead
> use a different statistical matching algorithm.
[administrivia: please do not top-post on this list]
I didn't think it through but my gut feeling is that we could change
the name similarity score to be the length of the tail part that
matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
is a better match than to a/2.b that does not share any tail, and to
a/1.a that shares the three bytes at the tail is an even better
match).
Oh, and rename basename_same() to something else; currently it is
only used as the "name similarity", and after such a change, it will
stay to be "name similarity" but will not be asking "are basenames
the same?" anymore.
Hint, hint...
next prev parent reply other threads:[~2014-07-01 17:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-30 6:38 move detection doesnt take filename into account Elliot Wolk
2014-07-01 9:16 ` Robin Rosenberg
2014-07-01 14:40 ` Elliot Wolk
2014-07-01 14:57 ` Junio C Hamano
2014-07-01 15:05 ` Elliot Wolk
2014-07-01 17:08 ` Junio C Hamano [this message]
2014-07-09 6:45 ` Jeff King
2014-07-09 15:51 ` Junio C Hamano
2014-07-09 22:03 ` Jeff King
2014-07-09 22:18 ` Junio C Hamano
2014-07-10 3:53 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq61jhxb0g.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox$(echo .)com \
--cc=elliot.wolk@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=robin.rosenberg@dewire$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox