Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite

public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed

From: David Kastrup <dak@gnu•org>
To: git@vger•kernel.org
Subject: Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite
Date: Mon, 27 Jan 2014 20:45:06 +0100	[thread overview]
Message-ID: <87a9eh8b0d.fsf@fencepost.gnu.org> (raw)
In-Reply-To: xmqq8uu1pdqq.fsf@gitster.dls.corp.google.com

Junio C Hamano <gitster@pobox•com> writes:

> David Kastrup <dak@gnu•org> writes:
>
>> The previous implementation uses a sorted linear list of struct
>> blame_entry in a struct scoreboard for organizing all partial or
>> completed work.  Every task that is done requires going through the
>> whole list where most entries are not relevant to the task at hand.
>>
>> This commit reorganizes the data structures in order to have each
>> remaining subtask work with its own sorted linear list it can work off
>> front to back.  Subtasks are organized into "struct origin" chains
>> hanging off particular commits.  Commits are organized into a priority
>> queue, processing them in commit date order in order to keep most of
>> the work affecting a particular blob collated even in the presence of
>> an extensive merge history.  In that manner, linear searches can be
>> basically avoided anywhere.  They still are done for identifying one
>> of multiple analyzed files in a given commit, but the degenerate case
>> of a single large file being assembled from a multitude of smaller
>> files in the past is not likely to occur often enough to warrant
>> special treatment.
>> ---
>
> Sign-off?

Not while this is not fit for merging because of #if 0 etc and missing
functionality.  This is just for review.

> Actually, I'd like to take my previous suggestion to add this as
> blame2 (to replace blame in the future) back.  That was based on my
> fear/hope to see an implementation based on a completely different
> approach, but the basic premise of having one blame_entry record per
> the lines of the final image in the scoreboard, using diff between
> parents to the child to find common lines for passing the blame up,
> etc. have not changed at all and the change is all about organizing
> the pieces of data in a *much* *better* way to avoid needlessly
> finding what we already have computed.

Yes.  Basically, the call graph outside of blame.c itself should be
pretty much the same.

> Style; please make /* and */ sit on their own line without anything
> else in multi-line comments.

Will do.

>> +	struct origin *next;
>>  	struct commit *commit;
>> +	/* `suspects' contains blame entries that may be attributed to
>> +	 * this origin's commit or to parent commits.  When a commit
>> +	 * is being processed, all suspects will be moved, either by
>> +	 * assigning them to an origin in a different commit, or by
>> +	 * shipping them to the scoreboard's ent list because they
>> +	 * cannot be attributed to a different commit.
>> +	 */
>> +	struct blame_entry *suspects;
>>  	mmfile_t file;
>>  	unsigned char blob_sha1[20];
>>  	unsigned mode;
>> +	/* shipped gets set when shipping any suspects to the final
>> +	 * blame list instead of other commits
>> +	 */
>> +	char shipped;
>
> Unused?

More like "added, forgotten, added under yet another name":

>> +	char guilty;

I'll have to decide which one to keep.

>> +		/* Should be present exactly once in commit chain */
>> +		for (p = o->commit->util; p; l = p, p = p->next) {
>> +			if (p == o)
>> +			{
>
> Style; please have { sit with the control structure that opens the
> block, unless it is the opening brace of the function body or
> struct/union definition.

Ok.

>> +static struct blame_entry *
>> +blame_merge(struct blame_entry *list1,
>> +	    struct blame_entry *list2)
>
> Style; we do not do GNU-ish "function name begins a line".  Even
> though I personally find it easier to use for things like 'grep',
> that is not the style we use around here.

Ok.  I'm also certain to have a few "space between function name and
arguments" left and will grep for those before submitting the next
version.

Next version will at least include -M option, possibly leaving -C for
later.

-- 
David Kastrup

next prev parent reply	other threads:[~2014-01-27 19:45 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-25 12:44 How to substructure rewrites? David Kastrup
2014-01-25 18:23 ` [PATCH 0/3] "Teaser" patch for rewriting blame for efficiency David Kastrup
2014-01-25 18:23   ` [PATCH 1/3] builtin/blame.c: struct blame_entry does not need a prev link David Kastrup
2014-01-25 18:23   ` [PATCH 2/3] Eliminate same_suspect function in builtin/blame.c David Kastrup
2014-01-25 18:23   ` [PATCH 3/3] builtin/blame.c: large-scale rewrite David Kastrup
2014-01-27 16:54     ` Junio C Hamano
2014-01-27 19:45       ` David Kastrup [this message]
2014-01-27 20:51         ` Junio C Hamano
2014-01-27 21:21           ` David Kastrup
2014-01-27 15:58 ` How to substructure rewrites? Junio C Hamano
2014-01-27 16:27   ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9eh8b0d.fsf@fencepost.gnu.org \
    --to=dak@gnu$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox