public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Atousa Duprat <atousa.p@gmail•com>
Cc: git@vger•kernel.org,
	"Rafael Espíndola" <rafael.espindola@gmail•com>,
	"Filipe Cabecinhas" <filcab@gmail•com>
Subject: Re: git fsck failure on OS X with files >= 4 GiB
Date: Thu, 29 Oct 2015 10:19:14 -0700	[thread overview]
Message-ID: <xmqqlhalsict.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <CA+izobtdwszVrYsnKU=_ytLuNbPGyRe_7kXqyrQO7u5Lo+OdPg@mail.gmail.com> (Atousa Duprat's message of "Thu, 29 Oct 2015 09:02:49 -0700")

Atousa Duprat <atousa.p@gmail•com> writes:

> [PATCH] Limit the size of the data block passed to SHA1_Update()
>
> This avoids issues where OS-specific implementations use
> a 32-bit integer to specify block size.  Limit currently
> set to 1GiB.
> ---
>  cache.h | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/cache.h b/cache.h
> index 79066e5..c305985 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -14,10 +14,28 @@
>  #ifndef git_SHA_CTX
>  #define git_SHA_CTX SHA_CTX
>  #define git_SHA1_Init SHA1_Init
> -#define git_SHA1_Update SHA1_Update
>  #define git_SHA1_Final SHA1_Final
>  #endif
>
> +#define SHA1_MAX_BLOCK_SIZE (1024*1024*1024)
> +
> +static inline int git_SHA1_Update(SHA_CTX *c, const void *data, size_t len)
> +{
> + size_t nr;
> + size_t total = 0;
> + char *cdata = (char*)data;
> + while(len > 0) {
> + nr = len;
> + if(nr > SHA1_MAX_BLOCK_SIZE)
> + nr = SHA1_MAX_BLOCK_SIZE;
> + SHA1_Update(c, cdata, nr);
> + total += nr;
> + cdata += nr;
> + len -= nr;
> + }
> + return total;
> +}
> +

I think the idea illustrated above is a good start, but there are
a few issues:

 * SHA1_Update() is used in fairly many places; it is unclear if it
   is a good idea to inline.

 * There is no need to punish implementations with working
   SHA1_Update by another level of wrapping.

 * What would you do when you find an implementation for which 1G is
   still too big?

Perhaps something like this in the header

#ifdef SHA1_MAX_BLOCK_SIZE
extern int SHA1_Update_Chunked(SHA_CTX *, const void *, size_t);
#define git_SHA1_Update SHA1_Update_Chunked
#endif

with compat/sha1_chunked.c that has

#ifdef SHA1_MAX_BLOCK_SIZE
int SHA1_Update_Chunked(SHA_CTX *c, const void *data, size_t len)
{
	... your looping implementation ...
}
#endif

in it, that is only triggered via a Makefile macro, e.g. 
might be a good workaround.

diff --git a/Makefile b/Makefile
index 8466333..83348b8 100644
--- a/Makefile
+++ b/Makefile
@@ -139,6 +139,10 @@ all::
 # Define PPC_SHA1 environment variable when running make to make use of
 # a bundled SHA1 routine optimized for PowerPC.
 #
+# Define SHA1_MAX_BLOCK_SIZE if your SSH1_Update() implementation can
+# hash only a limited amount of data in one call (e.g. APPLE_COMMON_CRYPTO
+# may want 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined).
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1002,6 +1006,7 @@ ifeq ($(uname_S),Darwin)
 	ifndef NO_APPLE_COMMON_CRYPTO
 		APPLE_COMMON_CRYPTO = YesPlease
 		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
+		SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L
 	endif
 	NO_REGEX = YesPlease
 	PTHREAD_LIBS =
@@ -1350,6 +1355,11 @@ endif
 endif
 endif
 
+ifdef SHA1_MAX_BLOCK_SIZE
+LIB_OBJS += compat/sha1_chunked.o
+BASIC_CFLAGS += SHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
+endif
+
 ifdef NO_PERL_MAKEMAKER
 	export NO_PERL_MAKEMAKER
 endif

  reply	other threads:[~2015-10-29 17:19 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-28 23:10 git fsck failure on OS X with files >= 4 GiB Rafael Espíndola
2015-10-29  6:46 ` Filipe Cabecinhas
     [not found] ` <CAEDE8505fXAwVXx=EZwxPHvXpMByzpnXJ9LBgfx3U6VUaFbPHw@mail.gmail.com>
2015-10-29 10:46   ` Rafael Espíndola
2015-10-29 15:15     ` Filipe Cabecinhas
2015-10-29 16:02       ` Atousa Duprat
2015-10-29 17:19         ` Junio C Hamano [this message]
2015-10-30  2:15           ` Atousa Duprat
2015-10-30 22:12             ` [PATCH] Limit the size of the data block passed to SHA1_Update() Atousa Pahlevan Duprat
2015-10-30 22:22               ` Junio C Hamano
2015-11-01  6:41                 ` Atousa Duprat
2015-11-01 18:31                   ` Junio C Hamano
2015-11-01  1:32               ` Eric Sunshine
2015-11-01  6:32                 ` atousa.p
2015-11-01  8:30                   ` Eric Sunshine
2015-11-01 18:37                   ` Junio C Hamano
2015-11-02 20:52                     ` Atousa Duprat
2015-11-02 21:21                       ` Junio C Hamano
2015-11-03  6:58                         ` [PATCH 1/2] " atousa.p
2015-11-03 11:51                           ` Torsten Bögershausen
2015-11-04  4:24                             ` [PATCH] " atousa.p
2015-11-04 19:51                               ` Eric Sunshine
2015-11-05  6:38                                 ` [PATCH v4 1/3] Provide another level of abstraction for the SHA1 utilities atousa.p
2015-11-05 18:29                                   ` Junio C Hamano
2015-11-05  6:38                                 ` [PATCH v4 2/3] Limit the size of the data block passed to SHA1_Update() atousa.p
2015-11-05 18:29                                   ` Junio C Hamano
2015-11-11 23:46                                     ` Atousa Duprat
2015-11-05  6:38                                 ` [PATCH v4 3/3] Move all the SHA1 implementations into one directory atousa.p
2015-11-05 18:29                                   ` Junio C Hamano
2015-11-04  4:27                             ` [PATCH 1/2] Limit the size of the data block passed to SHA1_Update() Atousa Duprat
2015-11-04 17:09                         ` [PATCH] " Junio C Hamano
2015-10-30 22:18             ` Atousa Pahlevan Duprat
2015-10-30 22:26               ` Randall S. Becker
2015-10-31 17:35                 ` Junio C Hamano
2015-11-01  6:37                 ` Atousa Duprat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqlhalsict.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=atousa.p@gmail$(echo .)com \
    --cc=filcab@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=rafael.espindola@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox