public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Jeff King <peff@peff•net>
Cc: Steffen Prohaska <prohaska@zib•de>,
	Git Mailing List <git@vger•kernel.org>,
	pclouds@gmail•com, john@keeping•me.uk, schacon@gmail•com
Subject: Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
Date: Tue, 26 Aug 2014 12:32:17 -0700	[thread overview]
Message-ID: <xmqqmwarhwse.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140826180018.GB17546@peff.net> (Jeff King's message of "Tue, 26 Aug 2014 14:00:18 -0400")

Jeff King <peff@peff•net> writes:

> On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote:
>
>> Steffen Prohaska <prohaska@zib•de> writes:
>> 
>> >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> >> obviously would not work for non-file inputs, but I think we address
>> >> that already in index_fd: we push non-seekable things off to index_pipe,
>> >> where we spool them to memory.
>> >
>> > It could be handled that way, but we would be back to the original problem
>> > that 32-bit git fails for large files.
>> 
>> Correct, and you are making an incremental improvement so that such
>> a large blob can be handled _when_ the filters can successfully
>> munge it back and forth.  If we fail due to out of memory when the
>> filters cannot, that would be the same as without your improvement,
>> so you are still making progress.
>
> I do not think my proposal makes anything worse than Steffen's patch.

I think we are saying the same thing, but perhaps I didn't phrase it
well.

> I think the main argument against going further is just that it is not
> worth the complexity. Tell people doing reduction filters they need to
> use "required", and that accomplishes the same thing.
>
>> >> So it seems like the ideal strategy would be:
>> >> 
>> >>  1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> >> 
>> >>  2. If it's not seekable and the filter is required, try streaming. We
>> >>     die anyway if we fail.
>> 
>> Puzzled...  Is it assumed that any content the filters tell us to
>> use the contents from the db as-is by exiting with non-zero status
>> will always be large not to fit in-core?  For small contents, isn't
>> this "ideal" strategy a regression?
>
> I am not sure what you mean by regression here. We will try to stream
> more often, but I do not see that as a bad thing.

I thought the proposed flow I was commenting on was

    - try streaming and die if the filter fails

For an optional filter working on contents that would fit in core,
we currently do

    - slurp in memory, filter it, use the original if the filter fails

If we switched to 2., then... ahh, ok, I misread "is required" part.
The "regression" does not apply to that case at all.

  reply	other threads:[~2014-08-26 19:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
2014-08-25 22:55   ` Junio C Hamano
2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
2014-08-25 11:38   ` Jeff King
2014-08-25 15:06     ` Steffen Prohaska
2014-08-25 15:12       ` Jeff King
2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
2014-08-25 12:43   ` Jeff King
2014-08-25 16:55     ` Steffen Prohaska
2014-08-25 18:35       ` Junio C Hamano
2014-08-26 18:00         ` Jeff King
2014-08-26 19:32           ` Junio C Hamano [this message]
2014-08-26 17:54       ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqmwarhwse.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=john@keeping$(echo .)me.uk \
    --cc=pclouds@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    --cc=prohaska@zib$(echo .)de \
    --cc=schacon@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox