public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Elliot Wolk <elliot.wolk@gmail•com>
Cc: Robin Rosenberg <robin.rosenberg@dewire•com>, git@vger•kernel.org
Subject: Re: move detection doesnt take filename into account
Date: Tue, 01 Jul 2014 10:08:15 -0700	[thread overview]
Message-ID: <xmqq61jhxb0g.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <53B2CE4A.9060509@gmail.com> (Elliot Wolk's message of "Tue, 01 Jul 2014 11:05:46 -0400")

Elliot Wolk <elliot.wolk@gmail•com> writes:

> On 07/01/2014 10:57 AM, Junio C Hamano wrote:
>> Robin Rosenberg <robin.rosenberg@dewire•com> writes:
>>
>>> I think it does, but based on filename suffix. E.g. here is a rename of
>>> three empty files with a suffix.
>>>
>>>   3 files changed, 0 insertions(+), 0 deletions(-)
>>>   rename 1.a => 2.a (100%)
>>>   rename 1.b => 2.b (100%)
>>>   rename 1.c => 2.c (100%)
>> This is not more than a chance.
>>
>> We tie-break rename source candidates that have the same content
>> similarity score to a rename destination using "name similarity",
>> whose implementation has been diffcore-rename.c::basename_same(),
>> which scores 1 if `basename $src` and `basename $dst` are the same
>> and 0 otherwise, i.e. from 1.a to a/1.a is judged to be a better
>> rename than from 1.a to a/2.a but otherwise there is nothing that
>> favors rename from 1.a to 2.a over 1.a to 2.b.
>
> thanks for the info!
> then i suppose my bug is a petition to have name similarity instead
> use a different statistical matching algorithm.

[administrivia: please do not top-post on this list]

I didn't think it through but my gut feeling is that we could change
the name similarity score to be the length of the tail part that
matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
is a better match than to a/2.b that does not share any tail, and to
a/1.a that shares the three bytes at the tail is an even better
match).

Oh, and rename basename_same() to something else; currently it is
only used as the "name similarity", and after such a change, it will
stay to be "name similarity" but will not be asking "are basenames
the same?" anymore.

Hint, hint...

  reply	other threads:[~2014-07-01 17:08 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30  6:38 move detection doesnt take filename into account Elliot Wolk
2014-07-01  9:16 ` Robin Rosenberg
2014-07-01 14:40   ` Elliot Wolk
2014-07-01 14:57   ` Junio C Hamano
2014-07-01 15:05     ` Elliot Wolk
2014-07-01 17:08       ` Junio C Hamano [this message]
2014-07-09  6:45         ` Jeff King
2014-07-09 15:51           ` Junio C Hamano
2014-07-09 22:03             ` Jeff King
2014-07-09 22:18               ` Junio C Hamano
2014-07-10  3:53                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq61jhxb0g.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=elliot.wolk@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=robin.rosenberg@dewire$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox