From: Junio C Hamano <gitster@pobox•com>
To: Duy Nguyen <pclouds@gmail•com>
Cc: Shawn Pearce <spearce@spearce•org>, git <git@vger•kernel.org>
Subject: Re: [PATCH 00/32] Split index mode for very large indexes
Date: Fri, 09 May 2014 10:55:22 -0700 [thread overview]
Message-ID: <xmqqha4yzvjp.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140509102744.GA4939@lanh> (Duy Nguyen's message of "Fri, 9 May 2014 17:27:44 +0700")
Duy Nguyen <pclouds@gmail•com> writes:
> On Mon, Apr 28, 2014 at 02:18:44PM -0700, Shawn Pearce wrote:
>> > The read penalty is not addressed here, so I still pay 14MB hashing
>> > cost. But that's an easy problem. We could cache the validated index
>> > in a daemon. Whenever git needs to load an index, it pokes the daemon.
>> > The daemon verifies that the on-disk index still has the same
>> > signature, then sends the in-mem index to git. When git updates the
>> > index, it pokes the daemon again to update in-mem index. Next time git
>> > reads the index, it does not have to pay I/O cost any more (actually
>> > it does but the cost is hidden away when you do not have to read it
>> > yet).
>>
>> If we are going this far, maybe it is worthwhile building a mmap()
>> region the daemon exports to the git client that holds the "in memory"
>> format of the index. Clients would mmap this PROT_READ, MAP_PRIVATE
>> and can then quickly access the base file information without doing
>> further validation, or copying the large(ish) data over a pipe.
>
> The below patch implements such a daemon to cache the index. It takes
> 91ms and 377ms to load a 25MB index with and without the daemon. I use
> share memory instead of pipe, but the format is still "on disk" not
> "in memory" for simplicity. I think we're good even without in memory
> format.
Interesting ;-).
next prev parent reply other threads:[~2014-05-09 17:55 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-28 10:55 [PATCH 00/32] Split index mode for very large indexes Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 01/32] ewah: fix constness of ewah_read_mmap Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 02/32] ewah: delete unused ewah_read_mmap_native declaration Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 03/32] sequencer: do not update/refresh index if the lock cannot be held Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 04/32] read-cache: new API write_locked_index instead of write_index/write_cache Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 05/32] read-cache: relocate and unexport commit_locked_index() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 06/32] read-cache: store in-memory flags in the first 12 bits of ce_flags Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 07/32] read-cache: be strict about "changed" in remove_marked_cache_entries() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 08/32] read-cache: be specific what part of the index has changed Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 09/32] update-index: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 10/32] resolve-undo: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 11/32] unpack-trees: " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 12/32] cache-tree: mark istate->cache_changed on cache tree invalidation Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 13/32] cache-tree: mark istate->cache_changed on cache tree update Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 14/32] cache-tree: mark istate->cache_changed on prime_cache_tree() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 15/32] entry.c: update cache_changed if refresh_cache is set in checkout_entry() Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 16/32] read-cache: save index SHA-1 after reading Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 17/32] read-cache: split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 22:46 ` Junio C Hamano
2014-04-29 1:43 ` Duy Nguyen
2014-04-29 17:23 ` Junio C Hamano
2014-04-29 22:45 ` Duy Nguyen
2014-04-30 13:57 ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 18/32] read-cache: mark new entries for split index Nguyễn Thái Ngọc Duy
2014-04-30 20:35 ` Eric Sunshine
2014-04-28 10:55 ` [PATCH 19/32] read-cache: save deleted entries in " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 20/32] read-cache: mark updated entries for " Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 21/32] split-index: the writing part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 22/32] split-index: the reading part Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 23/32] split-index: do not invalidate cache-tree at read time Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 24/32] split-index: strip pathname of on-disk replaced entries Nguyễn Thái Ngọc Duy
2014-04-29 20:25 ` Junio C Hamano
2014-04-28 10:55 ` [PATCH 25/32] update-index: new options to enable/disable split index mode Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 26/32] update-index --split-index: do not split if $GIT_DIR is read only Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 27/32] rev-parse: add --shared-index-path to get shared index path Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 28/32] read-tree: force split-index mode off on --index-output Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 29/32] read-tree: note about dropping split-index mode or index version Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 30/32] read-cache: force split index mode with GIT_TEST_SPLIT_INDEX Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 31/32] t2104: make sure split index mode is off for the version test Nguyễn Thái Ngọc Duy
2014-04-28 10:55 ` [PATCH 32/32] t1700: new tests for split-index mode Nguyễn Thái Ngọc Duy
2014-04-28 21:18 ` [PATCH 00/32] Split index mode for very large indexes Shawn Pearce
2014-04-29 1:52 ` Duy Nguyen
2014-05-09 10:27 ` Duy Nguyen
2014-05-09 17:55 ` Junio C Hamano [this message]
2014-05-13 11:15 ` [PATCH 0/8] Speed up cache loading time Nguyễn Thái Ngọc Duy
2014-05-13 11:15 ` [PATCH 1/8] read-cache: allow to keep mmap'd memory after reading Nguyễn Thái Ngọc Duy
2014-05-13 11:15 ` [PATCH 2/3] Add read-cache--daemon Nguyễn Thái Ngọc Duy
2014-05-13 11:52 ` Erik Faye-Lund
2014-05-13 12:01 ` Duy Nguyen
2014-05-13 13:01 ` Duy Nguyen
2014-05-13 13:37 ` Erik Faye-Lund
2014-05-13 13:49 ` Duy Nguyen
2014-05-13 14:06 ` Erik Faye-Lund
2014-05-13 14:10 ` Duy Nguyen
2014-05-13 14:16 ` Erik Faye-Lund
2014-05-13 11:15 ` [PATCH 2/8] unix-socket: stub impl. for platforms with no unix socket support Nguyễn Thái Ngọc Duy
2014-05-13 11:59 ` Erik Faye-Lund
2014-05-13 12:03 ` Erik Faye-Lund
2014-05-13 11:15 ` [PATCH 3/8] daemonize: set a flag before exiting the main process Nguyễn Thái Ngọc Duy
2014-05-13 11:15 ` [PATCH 3/3] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 11:15 ` [PATCH 4/8] Add read-cache--daemon for caching index and related stuff Nguyễn Thái Ngọc Duy
2014-05-13 11:56 ` Erik Faye-Lund
2014-05-13 11:15 ` [PATCH 5/8] read-cache: try index data from shared memory Nguyễn Thái Ngọc Duy
2014-05-13 12:13 ` Erik Faye-Lund
2014-05-13 11:15 ` [PATCH 6/8] read-cache--daemon: do not read index " Nguyễn Thái Ngọc Duy
2014-05-13 11:15 ` [PATCH 7/8] read-cache: skip verifying trailing SHA-1 on cached index Nguyễn Thái Ngọc Duy
2014-05-13 11:15 ` [PATCH 8/8] read-cache: inform the daemon that the index has been updated Nguyễn Thái Ngọc Duy
2014-05-13 12:17 ` Erik Faye-Lund
2014-05-22 16:38 ` David Turner
2014-05-13 14:24 ` [PATCH 0/8] Speed up cache loading time Stefan Beller
2014-05-13 14:35 ` Duy Nguyen
2014-05-13 11:20 ` [PATCH 9/8] even faster loading time with index version 254 Nguyễn Thái Ngọc Duy
2014-04-28 22:23 ` [PATCH 00/32] Split index mode for very large indexes Junio C Hamano
2014-04-30 20:48 ` Richard Hansen
2014-05-01 0:09 ` Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqha4yzvjp.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=pclouds@gmail$(echo .)com \
--cc=spearce@spearce$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox