public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Martin Koegler <martin.koegler@chello•at>,
	Ramsay Jones <ramsay@ramsayjones•plus.com>
Cc: git@vger•kernel.org, Johannes.Schindelin@gmx•de
Subject: Re: [PATCH 1/9] Convert pack-objects to size_t
Date: Mon, 14 Aug 2017 10:08:05 -0700	[thread overview]
Message-ID: <xmqqfucuw00a.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <xmqqtw1bw1v6.fsf@gitster.mtv.corp.google.com> (Junio C. Hamano's message of "Sun, 13 Aug 2017 15:15:41 -0700")

Junio C Hamano <gitster@pobox•com> writes:

> One interesting question is which of these two types we should use
> for the size of objects Git uses.  
>
> Most of the "interesting" operations done by Git require that the
> thing is in core as a whole before we can do anything (e.g. compare
> two such things to produce delta, have one in core and apply patch),
> so it is tempting that we deal with size_t, but at the lowest level
> to serve as a SCM, i.e. recording the state of a file at each
> version, we actually should be able to exceed the in-core
> limit---both "git add" of a huge file whose contents would not fit
> in-core and "git checkout" of a huge blob whose inflated contents
> would not fit in-core should (in theory, modulo bugs) be able to
> exercise the streaming interface to handle such case without holding
> everything in-core at once.  So from that point of view, even size_t
> may not be the "correct" type to use.

A few additions to the above observations.

 - We have varint that encodes how far the location from a delta
   representation of an object to its base object in the packfile.
   Both encoding and decoding sides in the current code use off_t to
   represent this offset, so we can already reference an object that
   is far in the same packfile as a base.

 - I think it is OK in practice to limit the size of individual
   objects to size_t (i.e. on 32-bit arch, you cannot interact with
   a repository with an object whose size exceeds 4GB).  Using off_t
   would allow occasional ultra-huge objects that can only be added
   and checked in via the streaming API on such a platform, but I
   suspect that it may become too much of a hassle to maintain.

   It may help reducing the maintenance if we introduced obj_size_t
   that is defined to be size_t for now, so that we can later swap
   it to ofs_t or some larger type when we know we do need to
   support objects whose size cannot be expressed in size_t, but I
   do not offhand know what the pros-and-cons with such an approach
   would look like.

Thanks.

  reply	other threads:[~2017-08-14 17:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-12  8:47 [PATCH 1/9] Convert pack-objects to size_t Martin Koegler
2017-08-12  8:47 ` [PATCH 2/9] Convert index-pack " Martin Koegler
2017-08-12 13:51   ` Ramsay Jones
2017-08-12  8:47 ` [PATCH 3/9] Convert unpack-objects " Martin Koegler
2017-08-12 14:07   ` Martin Ågren
2017-08-13 18:25     ` Martin Koegler
2017-08-12  8:47 ` [PATCH 4/9] Convert archive functions " Martin Koegler
2017-08-12  8:47 ` [PATCH 5/9] Convert various things " Martin Koegler
2017-08-12 13:27   ` Martin Ågren
2017-08-13 17:48     ` Martin Koegler
2017-08-12  8:47 ` [PATCH 6/9] Use size_t for config parsing Martin Koegler
2017-08-12  8:47 ` [PATCH 7/9] Convert ref-filter to size_t Martin Koegler
2017-08-12  8:47 ` [PATCH 8/9] Convert tree-walk " Martin Koegler
2017-08-12  8:47 ` [PATCH 9/9] Convert xdiff-interface " Martin Koegler
2017-08-12  9:59 ` [PATCH 1/9] Convert pack-objects " Torsten Bögershausen
2017-08-13 18:27   ` Martin Koegler
2017-08-12 13:47 ` Ramsay Jones
2017-08-13 18:30   ` Martin Koegler
2017-08-13 19:45     ` Ramsay Jones
2017-08-13 22:15       ` Junio C Hamano
2017-08-14 17:08         ` Junio C Hamano [this message]
2017-08-14 19:31           ` Ramsay Jones
2017-08-14 19:58             ` Junio C Hamano
2017-08-14 20:32             ` Torsten Bögershausen
2017-08-15  0:30               ` Ramsay Jones
2017-08-16 20:22           ` Martin Koegler
2017-08-17 10:38             ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqfucuw00a.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=Johannes.Schindelin@gmx$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=martin.koegler@chello$(echo .)at \
    --cc=ramsay@ramsayjones$(echo .)plus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox