From: Johan Herland <johan@herland•net>
To: Johannes Schindelin <Johannes.Schindelin@gmx•de>
Cc: git@vger•kernel.org, gitster@pobox•com, trast@student•ethz.ch,
tavestbo@trolltech•com, git@drmicha•warpmail.net,
chriscool@tuxfamily•org, spearce@spearce•org
Subject: [RFC] Use a 16-tree instead of a 256-tree for storing notes
Date: Wed, 26 Aug 2009 12:31:01 +0200 [thread overview]
Message-ID: <200908261231.01616.johan@herland.net> (raw)
In-Reply-To: <200908010436.57480.johan@herland.net>
The 256-tree structure is considerably faster than storing all entries in a
hash_map. Also, the memory consumption of the 256-tree structure is lower
than the hash_map, provided that you're only loading a few notes from a
"properly fanned-out" notes tree (i.e. 100000 notes in a 2/2/36 structure).
However, in the worst case (loading all 100000 notes), the memory usage of
the 256-tree structure (62.64 MB) is significantly worse than the hash_map
approach (10.25 MB).
This patch modifies the 256-tree structure into a 16-tree structure. This
significantly improves the memory situation. The result uses less memory
than both the 256-tree structure, and the hash_map approach, with a worst
case usage of 8.54 MB. Additionally, it seems to slightly improve the
runtime performance as well (probably because of the improved memory usage).
In conclusion, this is faster and smaller than all the previous drafts.
$ ./test_performance.sh
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 0...
15.05user 1.44system 0:16.59elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+782490minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 1...
0.68user 0.17system 0:00.87elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+99585minor)pagefaults 0swaps
Timing 100 reps of 'git log -n 10 refs/heads/master >/dev/null' at fanout level 2...
0.41user 0.17system 0:00.61elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+94084minor)pagefaults 0swaps
Signed-off-by: Johan Herland <johan@herland•net>
---
Hi,
This patch goes on top of the 256-tree RFC I sent earlier.
If nobody suggests further improvements, this patch will be
included in the next iteration of the jh/notes topic.
Have fun! :)
...Johan
notes.c | 26 ++++++++++++++------------
1 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/notes.c b/notes.c
index 9282b16..32b1e01 100644
--- a/notes.c
+++ b/notes.c
@@ -7,9 +7,9 @@
#include "tree-walk.h"
/*
- * Use a non-balancing simple 256-tree structure with struct int_node as
+ * Use a non-balancing simple 16-tree structure with struct int_node as
* internal nodes, and struct leaf_node as leaf nodes. Each int_node has a
- * 256-array of pointers to its children
+ * 16-array of pointers to its children
* The bottom 2 bits of each pointer is used to identify the pointer type
* - ptr & 3 == 0 - NULL pointer, assert(ptr == NULL)
* - ptr & 3 == 1 - pointer to next internal node - cast to struct int_node *
@@ -21,18 +21,18 @@
*
* To add a leaf_node:
* 1. Start at the root node, with n = 0
- * 2. Use the nth byte of the key as an index into a:
- * - If NULL, store the tweaked pointer directly into a[n]
- * - If an int_node, recurse into that node and increment n
- * - If a leaf_node:
+ * 2. Use the nth nibble of the key as an index into a:
+ * - If a[n] is NULL, store the tweaked pointer directly into a[n]
+ * - If a[n] is an int_node, recurse into that node and increment n
+ * - If a[n] is a leaf_node:
* 1. Check if they're equal, and handle that (abort? overwrite?)
* 2. Create a new int_node, and store both leaf_nodes there
* 3. Store the new int_node into a[n].
*
* To find a leaf_node:
* 1. Start at the root node, with n = 0
- * 2. Use the nth byte of the key as an index into a:
- * - If an int_node, recurse into that node and increment n
+ * 2. Use the nth nibble of the key as an index into a:
+ * - If a[n] is an int_node, recurse into that node and increment n
* - If a leaf_node with matching key, return leaf_node (assert note entry)
* - If a matching subtree entry, unpack that subtree entry (and remove it);
* restart search at the current level.
@@ -42,7 +42,7 @@
* subtree entry (and remove it); restart search at the current level.
*/
struct int_node {
- void *a[256];
+ void *a[16];
};
/*
@@ -79,11 +79,13 @@ static int initialized;
static void load_subtree(struct leaf_node *subtree, struct int_node *node,
unsigned int n);
+#define GET_NIBBLE(n, sha1) (((sha1[n >> 1]) >> ((n & 0x01) << 2)) & 0x0f)
+
static struct leaf_node *note_tree_find(struct int_node *tree, unsigned char n,
const unsigned char *key_sha1)
{
struct leaf_node *l;
- unsigned char i = key_sha1[n];
+ unsigned char i = GET_NIBBLE(n, key_sha1);
void *p = tree->a[i];
switch(GET_PTR_TYPE(p)) {
@@ -138,7 +140,7 @@ static int note_tree_insert(struct int_node *tree, unsigned char n,
struct int_node *new_node;
const struct leaf_node *l;
int ret;
- unsigned char i = entry->key_sha1[n];
+ unsigned char i = GET_NIBBLE(n, entry->key_sha1);
void *p = tree->a[i];
assert(GET_PTR_TYPE(entry) == PTR_TYPE_NULL);
switch(GET_PTR_TYPE(p)) {
@@ -211,7 +213,7 @@ static void load_subtree(struct leaf_node *subtree, struct int_node *node,
sha1_to_hex(subtree->val_sha1));
prefix_len = subtree->key_sha1[19];
- assert(prefix_len >= n);
+ assert(prefix_len * 2 >= n);
memcpy(commit_sha1, subtree->key_sha1, prefix_len);
while (tree_entry(&desc, &entry)) {
int len = get_sha1_hex_segment(entry.path, strlen(entry.path),
--
1.6.4.304.g1365c.dirty
next prev parent reply other threads:[~2009-08-26 10:34 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-29 2:25 [PATCHv3 0/8] RESEND: git notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 1/8] Introduce commit notes Johan Herland
2009-07-29 8:52 ` Alex Riesen
2009-07-29 16:40 ` Johannes Schindelin
2009-07-30 0:50 ` Johan Herland
2009-07-30 1:14 ` Johannes Schindelin
2009-07-30 0:42 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 2/8] Add a script to edit/inspect notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 3/8] Speed up git notes lookup Johan Herland
2009-07-29 2:25 ` [PATCHv3 4/8] Add an expensive test for git-notes Johan Herland
2009-07-29 2:25 ` [PATCHv3 5/8] Teach "-m <msg>" and "-F <file>" to "git notes edit" Johan Herland
2009-07-29 7:57 ` Thomas Rast
2009-07-30 1:02 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees Johan Herland
2009-07-29 16:45 ` Johannes Schindelin
2009-07-30 0:18 ` Testing performance of the notes lookup code (Was: [PATCHv3 6/8] First draft of notes tree parser with support for fanout subtrees) Johan Herland
2009-08-01 2:36 ` [RFC] First draft of 256-tree structure for storing notes Johan Herland
2009-08-13 3:00 ` [RFC] Store subtree entries in the same hash map as the note entries Johan Herland
2009-08-26 10:31 ` Johan Herland [this message]
2009-08-26 12:05 ` [RFC] Use a 16-tree instead of a 256-tree for storing notes Alex Riesen
2009-08-26 12:56 ` Johan Herland
2009-08-26 13:24 ` Alex Riesen
2009-08-26 13:27 ` Andreas Ericsson
2009-08-26 14:43 ` Johan Herland
2009-08-27 11:56 ` Johannes Schindelin
2009-07-29 2:25 ` [PATCHv3 7/8] fast-import: Add support for importing commit notes Johan Herland
2009-07-29 14:21 ` Shawn O. Pearce
2009-07-30 1:29 ` Johan Herland
2009-07-29 2:25 ` [PATCHv3 8/8] t3302-notes-index-expensive: Speed up create_repo() Johan Herland
2009-07-29 16:46 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200908261231.01616.johan@herland.net \
--to=johan@herland$(echo .)net \
--cc=Johannes.Schindelin@gmx$(echo .)de \
--cc=chriscool@tuxfamily$(echo .)org \
--cc=git@drmicha$(echo .)warpmail.net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=spearce@spearce$(echo .)org \
--cc=tavestbo@trolltech$(echo .)com \
--cc=trast@student$(echo .)ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox