From: David Kastrup <dak@gnu•org>
To: git@vger•kernel.org
Subject: Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite
Date: Mon, 27 Jan 2014 20:45:06 +0100 [thread overview]
Message-ID: <87a9eh8b0d.fsf@fencepost.gnu.org> (raw)
In-Reply-To: xmqq8uu1pdqq.fsf@gitster.dls.corp.google.com
Junio C Hamano <gitster@pobox•com> writes:
> David Kastrup <dak@gnu•org> writes:
>
>> The previous implementation uses a sorted linear list of struct
>> blame_entry in a struct scoreboard for organizing all partial or
>> completed work. Every task that is done requires going through the
>> whole list where most entries are not relevant to the task at hand.
>>
>> This commit reorganizes the data structures in order to have each
>> remaining subtask work with its own sorted linear list it can work off
>> front to back. Subtasks are organized into "struct origin" chains
>> hanging off particular commits. Commits are organized into a priority
>> queue, processing them in commit date order in order to keep most of
>> the work affecting a particular blob collated even in the presence of
>> an extensive merge history. In that manner, linear searches can be
>> basically avoided anywhere. They still are done for identifying one
>> of multiple analyzed files in a given commit, but the degenerate case
>> of a single large file being assembled from a multitude of smaller
>> files in the past is not likely to occur often enough to warrant
>> special treatment.
>> ---
>
> Sign-off?
Not while this is not fit for merging because of #if 0 etc and missing
functionality. This is just for review.
> Actually, I'd like to take my previous suggestion to add this as
> blame2 (to replace blame in the future) back. That was based on my
> fear/hope to see an implementation based on a completely different
> approach, but the basic premise of having one blame_entry record per
> the lines of the final image in the scoreboard, using diff between
> parents to the child to find common lines for passing the blame up,
> etc. have not changed at all and the change is all about organizing
> the pieces of data in a *much* *better* way to avoid needlessly
> finding what we already have computed.
Yes. Basically, the call graph outside of blame.c itself should be
pretty much the same.
> Style; please make /* and */ sit on their own line without anything
> else in multi-line comments.
Will do.
>> + struct origin *next;
>> struct commit *commit;
>> + /* `suspects' contains blame entries that may be attributed to
>> + * this origin's commit or to parent commits. When a commit
>> + * is being processed, all suspects will be moved, either by
>> + * assigning them to an origin in a different commit, or by
>> + * shipping them to the scoreboard's ent list because they
>> + * cannot be attributed to a different commit.
>> + */
>> + struct blame_entry *suspects;
>> mmfile_t file;
>> unsigned char blob_sha1[20];
>> unsigned mode;
>> + /* shipped gets set when shipping any suspects to the final
>> + * blame list instead of other commits
>> + */
>> + char shipped;
>
> Unused?
More like "added, forgotten, added under yet another name":
>> + char guilty;
I'll have to decide which one to keep.
>> + /* Should be present exactly once in commit chain */
>> + for (p = o->commit->util; p; l = p, p = p->next) {
>> + if (p == o)
>> + {
>
> Style; please have { sit with the control structure that opens the
> block, unless it is the opening brace of the function body or
> struct/union definition.
Ok.
>> +static struct blame_entry *
>> +blame_merge(struct blame_entry *list1,
>> + struct blame_entry *list2)
>
> Style; we do not do GNU-ish "function name begins a line". Even
> though I personally find it easier to use for things like 'grep',
> that is not the style we use around here.
Ok. I'm also certain to have a few "space between function name and
arguments" left and will grep for those before submitting the next
version.
Next version will at least include -M option, possibly leaving -C for
later.
--
David Kastrup
next prev parent reply other threads:[~2014-01-27 19:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-25 12:44 How to substructure rewrites? David Kastrup
2014-01-25 18:23 ` [PATCH 0/3] "Teaser" patch for rewriting blame for efficiency David Kastrup
2014-01-25 18:23 ` [PATCH 1/3] builtin/blame.c: struct blame_entry does not need a prev link David Kastrup
2014-01-25 18:23 ` [PATCH 2/3] Eliminate same_suspect function in builtin/blame.c David Kastrup
2014-01-25 18:23 ` [PATCH 3/3] builtin/blame.c: large-scale rewrite David Kastrup
2014-01-27 16:54 ` Junio C Hamano
2014-01-27 19:45 ` David Kastrup [this message]
2014-01-27 20:51 ` Junio C Hamano
2014-01-27 21:21 ` David Kastrup
2014-01-27 15:58 ` How to substructure rewrites? Junio C Hamano
2014-01-27 16:27 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a9eh8b0d.fsf@fencepost.gnu.org \
--to=dak@gnu$(echo .)org \
--cc=git@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox