From: Johan Herland <johan@herland•net>
To: "Shawn O. Pearce" <spearce@spearce•org>
Cc: git@vger•kernel.org, gitster@pobox•com
Subject: Re: [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation
Date: Tue, 08 Dec 2009 02:44:30 +0100 [thread overview]
Message-ID: <200912080244.30390.johan@herland.net> (raw)
In-Reply-To: <20091207164130.GD17173@spearce.org>
On Monday 07 December 2009, Shawn O. Pearce wrote:
> Johan Herland <johan@herland•net> wrote:
> > +static uintmax_t do_change_note_fanout(
> > + struct tree_entry *orig_root, struct tree_entry *root,
> > + char *hex_sha1, unsigned int hex_sha1_len,
> > + char *fullpath, unsigned int fullpath_len,
> > + unsigned char fanout)
>
> I think this function winds up processing all notes twice. Yuck.
>
> tree_content_set() adds a new tree entry to the end of the current
> tree. So when converting "1a9029b006484e8b9aca06ff261beb2324bb9916"
> into "1a" (to go from fanout 0 to fanout 1) we'll place 1a at the
> end of orig_root, and this function will walk 1a/ recursively,
> examining 1a9029b006484e8b9aca06ff261beb2324bb9916 all over again.
Yep, you're right. Still, we only do the tree_content_remove()/set() once
per note, so although performance is probably not abysmal, we are still
clearly suboptimal.
Also, keep in mind that change_note_fanout() is only called when the number
of notes crosses a power of 256. Thus for typical notes trees (which are
assumed to mostly accumulate notes over their lifetime),
change_note_fanout() will be called zero, one or two times (depending on the
final number of notes).
> If we're here, isn't it likely that *all* notes are in the wrong
> path in the tree, and we need to move them all to a new location?
> If that's true then should we instead just build an entirely new
> tree and swap the root when we are done?
Hmm. Not always. In your earlier scenario where we add 2,000,000 notes in a
single commit, the current code would need to rewrite 255 of them from
fanout 0 to fanout 2, and 65,535 of them from fanout 1 to fanout 2. But the
vast majority (1,934,465) would not require rewriting (having been added at
the correct fanout initially). However, if we build a new tree (by which I
assume you mean tree_content_remove() from the old tree and
tree_content_set() to the new tree for every single note (and non-note)), we
end up processing all 2,000,000 entries.
> As we empty out a tree the object will be recycled into a pool of
> trees which can be reused at a later point. It might actually make
> sense to build the new tree under a different root. We won't scan
> entries we've moved, and the memory difference should be fairly
> small as tree_content_remove() will make a subtree available for
> reuse as soon as its empty. So we're only dealing with a handful
> of additional tree objects as we do the conversion.
I'm not sure I get the details here. How can we avoid doing the
_remove()/_set() from/to the old/new tree for every tree_entry? In other
words, how do we avoid removing and re-setting the 2,000,000 notes in the
above example?
Thanks for the review!
...Johan
--
Johan Herland, <johan@herland•net>
www.herland.net
next prev parent reply other threads:[~2009-12-08 1:44 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-07 11:27 [RFC/PATCHv10 00/11] git notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 01/11] fast-import: Proper notes tree manipulation Johan Herland
2009-12-07 16:41 ` Shawn O. Pearce
2009-12-08 1:44 ` Johan Herland [this message]
2009-12-08 2:01 ` Shawn O. Pearce
2009-12-08 2:45 ` Johan Herland
2009-12-10 9:39 ` Johan Herland
2009-12-10 14:03 ` Shawn O. Pearce
2009-12-10 14:40 ` Johan Herland
2009-12-11 3:00 ` Junio C Hamano
2009-12-07 16:43 ` Shawn O. Pearce
2009-12-08 1:55 ` Johan Herland
2009-12-08 1:59 ` Shawn O. Pearce
2009-12-07 20:42 ` Junio C Hamano
2009-12-08 2:34 ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 02/11] Rename t9301 to t9350, to make room for more fast-import tests Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 03/11] Add more testcases to test fast-import of notes Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 04/11] Minor style fixes to notes.c Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 05/11] Notes API: get_commit_notes() -> format_note() + remove the commit restriction Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 06/11] Notes API: init_notes(): Initialize the notes tree from the given notes ref Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 07/11] Notes API: add_note(): Add note objects to the internal notes tree structure Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 08/11] Notes API: get_note(): Return the note annotating the given object Johan Herland
2009-12-07 20:52 ` Junio C Hamano
2009-12-08 3:18 ` Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 09/11] Notes API: for_each_note(): Traverse the entire notes tree with a callback Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 10/11] Notes API: Allow multiple concurrent notes trees with new struct notes_tree Johan Herland
2009-12-07 11:27 ` [RFC/PATCHv10 11/11] Refactor notes concatenation into a flexible interface for combining notes Johan Herland
2009-12-08 9:25 ` [RFC/PATCHv10 00/11] git notes Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200912080244.30390.johan@herland.net \
--to=johan@herland$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=spearce@spearce$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox