From: Junio C Hamano <gitster@pobox•com>
To: Jeff King <peff@peff•net>
Cc: Neal Kreitzinger <nkreitzinger@gmail•com>,
Bo Chen <chen@chenirvine•org>,
Sergio <sergio.callegari@gmail•com>,
git@vger•kernel.org
Subject: Re: GSoC - Some questions on the idea of
Date: Mon, 02 Apr 2012 15:19:35 -0700 [thread overview]
Message-ID: <7vvclhdbew.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20120402214049.GB28926@sigill.intra.peff.net> (Jeff King's message of "Mon, 2 Apr 2012 17:40:49 -0400")
Jeff King <peff@peff•net> writes:
> 1. You really have 100G of data in the current version that doesn't
> compress well (e.g., you are storing your music collection). You
> can't afford to store two copies on your laptop (because you have a
> fancy SSD, and 100G is expensive again). You need the working tree
> version, but it's OK to stream the repo version of a blob from the
> network when you actually need it (mostly "checkout", assuming you
> have marked the file as "-diff").
This feels like a good candidate for an independent project that allows
you fuse-mount from a remote repository to give you an illusion that you
have a checkout of a specific version. Such a remote fuse-server would be
an application that is built using Git, but I do not think we are in any
business on the client end in such a setup.
So I'll write it off as a "non-Git" issue for now.
The other parts of your message is much more interesting.
> Right. This is the same concept, except over the network. So people's
> working repositories are on their own workstations instead of a central
> server. You could even do it today by network-mounting a filesystem and
> pointing your alternates file at it. However, I think it's worth making
> git aware that the objects are on the network for a few reasons:
>
> 1. Git can be more careful about how it handles the objects, including
> when to fetch, when to stream, and when to cache. For example,
> you'd want to fetch the manifest of objects and cache it in your
> local repository, because you want fast lookups of "do I have this
> object".
>
> 2. Providing remote filesystems on an Internet scale is a management
> pain (and it's a pain for the user, too). My thought was that this
> would be implemented on top of http (the connection setup cost is
> negligible, since these objects would generally be large).
>
> 3. Usually alternate repositories are full repositories that meet the
> connectivity requirements (so you could run "git fsck" in them).
> But this is explicitly about taking just a few disconnected large
> blobs out of the repository and putting them elsewhere. So it needs
> a new set of tools for managing the upstream repository.
Or you can split out the really large write-only blobs out of SCM control.
Every time you introduce a new blob, throw it verbatim in an append-only
directory on a networked filesystem under some unique ID as its filename,
and maintain a symlink into that networked filesystem under SCM control.
I think git-annex already does something like that...
next prev parent reply other threads:[~2012-04-02 22:19 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-28 4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28 6:19 ` Nguyen Thai Ngoc Duy
2012-03-28 11:33 ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44 ` Bo Chen
2012-03-30 19:51 ` Bo Chen
2012-03-30 20:34 ` Jeff King
2012-03-30 23:08 ` Bo Chen
2012-03-31 11:02 ` Sergio Callegari
2012-03-31 16:18 ` Neal Kreitzinger
2012-04-02 21:07 ` Jeff King
2012-04-03 9:58 ` Sergio Callegari
2012-04-11 1:24 ` Neal Kreitzinger
2012-04-11 6:04 ` Jonathan Nieder
2012-04-11 16:29 ` Neal Kreitzinger
2012-04-11 22:09 ` Jeff King
2012-04-11 16:35 ` Neal Kreitzinger
2012-04-11 16:44 ` Neal Kreitzinger
2012-04-11 17:20 ` Jonathan Nieder
2012-04-11 18:51 ` Junio C Hamano
2012-04-11 19:03 ` Jonathan Nieder
2012-04-11 18:23 ` Neal Kreitzinger
2012-04-11 21:35 ` Jeff King
2012-04-12 19:29 ` Neal Kreitzinger
2012-04-12 21:03 ` Jeff King
[not found] ` <4F8A2EBD.1070407@gmail.com>
2012-04-15 2:15 ` Jeff King
2012-04-15 2:33 ` Neal Kreitzinger
2012-04-16 14:54 ` Jeff King
2012-05-10 21:43 ` Neal Kreitzinger
2012-05-10 22:39 ` Jeff King
2012-04-12 21:08 ` Neal Kreitzinger
2012-04-13 21:36 ` Bo Chen
2012-03-31 15:19 ` Neal Kreitzinger
2012-04-02 21:40 ` Jeff King
2012-04-02 22:19 ` Junio C Hamano [this message]
2012-04-03 10:07 ` Jeff King
2012-03-31 16:49 ` Neal Kreitzinger
2012-03-31 20:28 ` Neal Kreitzinger
2012-03-31 21:27 ` Bo Chen
2012-04-01 4:22 ` Nguyen Thai Ngoc Duy
2012-04-01 23:30 ` Bo Chen
2012-04-02 1:00 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11 ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vvclhdbew.fsf@alter.siamese.dyndns.org \
--to=gitster@pobox$(echo .)com \
--cc=chen@chenirvine$(echo .)org \
--cc=git@vger$(echo .)kernel.org \
--cc=nkreitzinger@gmail$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=sergio.callegari@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox