public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Johan Herland <johan@herland•net>
To: "Shawn O. Pearce" <spearce@spearce•org>
Cc: git@vger•kernel.org, gitster@pobox•com
Subject: Re: [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation
Date: Tue, 08 Dec 2009 02:44:30 +0100	[thread overview]
Message-ID: <200912080244.30390.johan@herland.net> (raw)
In-Reply-To: <20091207164130.GD17173@spearce.org>

On Monday 07 December 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland•net> wrote:
> > +static uintmax_t do_change_note_fanout(
> > +		struct tree_entry *orig_root, struct tree_entry *root,
> > +		char *hex_sha1, unsigned int hex_sha1_len,
> > +		char *fullpath, unsigned int fullpath_len,
> > +		unsigned char fanout)
> 
> I think this function winds up processing all notes twice.  Yuck.
> 
> tree_content_set() adds a new tree entry to the end of the current
> tree.  So when converting "1a9029b006484e8b9aca06ff261beb2324bb9916"
> into "1a" (to go from fanout 0 to fanout 1) we'll place 1a at the
> end of orig_root, and this function will walk 1a/ recursively,
> examining 1a9029b006484e8b9aca06ff261beb2324bb9916 all over again.

Yep, you're right. Still, we only do the tree_content_remove()/set() once 
per note, so although performance is probably not abysmal, we are still 
clearly suboptimal.

Also, keep in mind that change_note_fanout() is only called when the number 
of notes crosses a power of 256. Thus for typical notes trees (which are 
assumed to mostly accumulate notes over their lifetime), 
change_note_fanout() will be called zero, one or two times (depending on the 
final number of notes).

> If we're here, isn't it likely that *all* notes are in the wrong
> path in the tree, and we need to move them all to a new location?
> If that's true then should we instead just build an entirely new
> tree and swap the root when we are done?

Hmm. Not always. In your earlier scenario where we add 2,000,000 notes in a 
single commit, the current code would need to rewrite 255 of them from 
fanout 0 to fanout 2, and 65,535 of them from fanout 1 to fanout 2. But the 
vast majority (1,934,465) would not require rewriting (having been added at 
the correct fanout initially). However, if we build a new tree (by which I 
assume you mean tree_content_remove() from the old tree and 
tree_content_set() to the new tree for every single note (and non-note)), we 
end up processing all 2,000,000 entries.

> As we empty out a tree the object will be recycled into a pool of
> trees which can be reused at a later point.  It might actually make
> sense to build the new tree under a different root.  We won't scan
> entries we've moved, and the memory difference should be fairly
> small as tree_content_remove() will make a subtree available for
> reuse as soon as its empty.  So we're only dealing with a handful
> of additional tree objects as we do the conversion.

I'm not sure I get the details here. How can we avoid doing the 
_remove()/_set() from/to the old/new tree for every tree_entry? In other 
words, how do we avoid removing and re-setting the 2,000,000 notes in the 
above example?


Thanks for the review!

...Johan

-- 
Johan Herland, <johan@herland•net>
www.herland.net

  reply	other threads:[~2009-12-08  1:44 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-07 11:27 [RFC/PATCHv10 00/11] git notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation Johan Herland
2009-12-07 16:41   ` Shawn O. Pearce
2009-12-08  1:44     ` Johan Herland [this message]
2009-12-08  2:01       ` Shawn O. Pearce
2009-12-08  2:45         ` Johan Herland
2009-12-10  9:39           ` Johan Herland
2009-12-10 14:03             ` Shawn O. Pearce
2009-12-10 14:40               ` Johan Herland
2009-12-11  3:00                 ` Junio C Hamano
2009-12-07 16:43   ` Shawn O. Pearce
2009-12-08  1:55     ` Johan Herland
2009-12-08  1:59       ` Shawn O. Pearce
2009-12-07 20:42   ` Junio C Hamano
2009-12-08  2:34     ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 02/11] Rename t9301 to t9350, to make room for more fast-import tests Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 03/11] Add more testcases to test fast-import of notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 04/11] Minor style fixes to notes.c Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 05/11] Notes API: get_commit_notes() -> format_note() + remove the commit restriction Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 06/11] Notes API: init_notes(): Initialize the notes tree from the given notes ref Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 07/11] Notes API: add_note(): Add note objects to the internal notes tree structure Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 08/11] Notes API: get_note(): Return the note annotating the given object Johan Herland
2009-12-07 20:52   ` Junio C Hamano
2009-12-08  3:18     ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 09/11] Notes API: for_each_note(): Traverse the entire notes tree with a callback Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 10/11] Notes API: Allow multiple concurrent notes trees with new struct notes_tree Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 11/11] Refactor notes concatenation into a flexible interface for combining notes Johan Herland
2009-12-08  9:25 ` [RFC/PATCHv10 00/11] git notes Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200912080244.30390.johan@herland.net \
    --to=johan@herland$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=spearce@spearce$(echo .)org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox