From: Junio C Hamano <gitster@pobox•com>
To: Jeff King <peff@peff•net>
Cc: Steffen Prohaska <prohaska@zib•de>,
Git Mailing List <git@vger•kernel.org>,
pclouds@gmail•com, john@keeping•me.uk, schacon@gmail•com
Subject: Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
Date: Tue, 26 Aug 2014 12:32:17 -0700 [thread overview]
Message-ID: <xmqqmwarhwse.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140826180018.GB17546@peff.net> (Jeff King's message of "Tue, 26 Aug 2014 14:00:18 -0400")
Jeff King <peff@peff•net> writes:
> On Mon, Aug 25, 2014 at 11:35:45AM -0700, Junio C Hamano wrote:
>
>> Steffen Prohaska <prohaska@zib•de> writes:
>>
>> >> Couldn't we do that with an lseek (or even an mmap with offset 0)? That
>> >> obviously would not work for non-file inputs, but I think we address
>> >> that already in index_fd: we push non-seekable things off to index_pipe,
>> >> where we spool them to memory.
>> >
>> > It could be handled that way, but we would be back to the original problem
>> > that 32-bit git fails for large files.
>>
>> Correct, and you are making an incremental improvement so that such
>> a large blob can be handled _when_ the filters can successfully
>> munge it back and forth. If we fail due to out of memory when the
>> filters cannot, that would be the same as without your improvement,
>> so you are still making progress.
>
> I do not think my proposal makes anything worse than Steffen's patch.
I think we are saying the same thing, but perhaps I didn't phrase it
well.
> I think the main argument against going further is just that it is not
> worth the complexity. Tell people doing reduction filters they need to
> use "required", and that accomplishes the same thing.
>
>> >> So it seems like the ideal strategy would be:
>> >>
>> >> 1. If it's seekable, try streaming. If not, fall back to lseek/mmap.
>> >>
>> >> 2. If it's not seekable and the filter is required, try streaming. We
>> >> die anyway if we fail.
>>
>> Puzzled... Is it assumed that any content the filters tell us to
>> use the contents from the db as-is by exiting with non-zero status
>> will always be large not to fit in-core? For small contents, isn't
>> this "ideal" strategy a regression?
>
> I am not sure what you mean by regression here. We will try to stream
> more often, but I do not see that as a bad thing.
I thought the proposed flow I was commenting on was
- try streaming and die if the filter fails
For an optional filter working on contents that would fit in core,
we currently do
- slurp in memory, filter it, use the original if the filter fails
If we switched to 2., then... ahh, ok, I misread "is required" part.
The "regression" does not apply to that case at all.
next prev parent reply other threads:[~2014-08-26 19:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-24 16:07 [PATCH v5 0/4] Stream fd to clean filter; GIT_MMAP_LIMIT, GIT_ALLOC_LIMIT with git_parse_ulong() Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path' Steffen Prohaska
2014-08-25 22:55 ` Junio C Hamano
2014-08-24 16:07 ` [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong() Steffen Prohaska
2014-08-25 11:38 ` Jeff King
2014-08-25 15:06 ` Steffen Prohaska
2014-08-25 15:12 ` Jeff King
2014-08-24 16:07 ` [PATCH v5 3/4] Introduce GIT_MMAP_LIMIT to allow testing expected mmap size Steffen Prohaska
2014-08-24 16:07 ` [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap Steffen Prohaska
2014-08-25 12:43 ` Jeff King
2014-08-25 16:55 ` Steffen Prohaska
2014-08-25 18:35 ` Junio C Hamano
2014-08-26 18:00 ` Jeff King
2014-08-26 19:32 ` Junio C Hamano [this message]
2014-08-26 17:54 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqmwarhwse.fsf@gitster.dls.corp.google.com \
--to=gitster@pobox$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=john@keeping$(echo .)me.uk \
--cc=pclouds@gmail$(echo .)com \
--cc=peff@peff$(echo .)net \
--cc=prohaska@zib$(echo .)de \
--cc=schacon@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox