From: Junio C Hamano <gitster@pobox•com>
To: Martin Koegler <martin.koegler@chello•at>,
Ramsay Jones <ramsay@ramsayjones•plus.com>
Cc: git@vger•kernel.org, Johannes.Schindelin@gmx•de
Subject: Re: [PATCH 1/9] Convert pack-objects to size_t
Date: Mon, 14 Aug 2017 10:08:05 -0700 [thread overview]
Message-ID: <xmqqfucuw00a.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqtw1bw1v6.fsf@gitster.mtv.corp.google.com> (Junio C. Hamano's message of "Sun, 13 Aug 2017 15:15:41 -0700")
Junio C Hamano <gitster@pobox•com> writes:
> One interesting question is which of these two types we should use
> for the size of objects Git uses.
>
> Most of the "interesting" operations done by Git require that the
> thing is in core as a whole before we can do anything (e.g. compare
> two such things to produce delta, have one in core and apply patch),
> so it is tempting that we deal with size_t, but at the lowest level
> to serve as a SCM, i.e. recording the state of a file at each
> version, we actually should be able to exceed the in-core
> limit---both "git add" of a huge file whose contents would not fit
> in-core and "git checkout" of a huge blob whose inflated contents
> would not fit in-core should (in theory, modulo bugs) be able to
> exercise the streaming interface to handle such case without holding
> everything in-core at once. So from that point of view, even size_t
> may not be the "correct" type to use.
A few additions to the above observations.
- We have varint that encodes how far the location from a delta
representation of an object to its base object in the packfile.
Both encoding and decoding sides in the current code use off_t to
represent this offset, so we can already reference an object that
is far in the same packfile as a base.
- I think it is OK in practice to limit the size of individual
objects to size_t (i.e. on 32-bit arch, you cannot interact with
a repository with an object whose size exceeds 4GB). Using off_t
would allow occasional ultra-huge objects that can only be added
and checked in via the streaming API on such a platform, but I
suspect that it may become too much of a hassle to maintain.
It may help reducing the maintenance if we introduced obj_size_t
that is defined to be size_t for now, so that we can later swap
it to ofs_t or some larger type when we know we do need to
support objects whose size cannot be expressed in size_t, but I
do not offhand know what the pros-and-cons with such an approach
would look like.
Thanks.
next prev parent reply other threads:[~2017-08-14 17:08 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-12 8:47 [PATCH 1/9] Convert pack-objects to size_t Martin Koegler
2017-08-12 8:47 ` [PATCH 2/9] Convert index-pack " Martin Koegler
2017-08-12 13:51 ` Ramsay Jones
2017-08-12 8:47 ` [PATCH 3/9] Convert unpack-objects " Martin Koegler
2017-08-12 14:07 ` Martin Ågren
2017-08-13 18:25 ` Martin Koegler
2017-08-12 8:47 ` [PATCH 4/9] Convert archive functions " Martin Koegler
2017-08-12 8:47 ` [PATCH 5/9] Convert various things " Martin Koegler
2017-08-12 13:27 ` Martin Ågren
2017-08-13 17:48 ` Martin Koegler
2017-08-12 8:47 ` [PATCH 6/9] Use size_t for config parsing Martin Koegler
2017-08-12 8:47 ` [PATCH 7/9] Convert ref-filter to size_t Martin Koegler
2017-08-12 8:47 ` [PATCH 8/9] Convert tree-walk " Martin Koegler
2017-08-12 8:47 ` [PATCH 9/9] Convert xdiff-interface " Martin Koegler
2017-08-12 9:59 ` [PATCH 1/9] Convert pack-objects " Torsten Bögershausen
2017-08-13 18:27 ` Martin Koegler
2017-08-12 13:47 ` Ramsay Jones
2017-08-13 18:30 ` Martin Koegler
2017-08-13 19:45 ` Ramsay Jones
2017-08-13 22:15 ` Junio C Hamano
2017-08-14 17:08 ` Junio C Hamano [this message]
2017-08-14 19:31 ` Ramsay Jones
2017-08-14 19:58 ` Junio C Hamano
2017-08-14 20:32 ` Torsten Bögershausen
2017-08-15 0:30 ` Ramsay Jones
2017-08-16 20:22 ` Martin Koegler
2017-08-17 10:38 ` Torsten Bögershausen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqfucuw00a.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox$(echo .)com \
--cc=Johannes.Schindelin@gmx$(echo .)de \
--cc=git@vger$(echo .)kernel.org \
--cc=martin.koegler@chello$(echo .)at \
--cc=ramsay@ramsayjones$(echo .)plus.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox