public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail•com>
To: Jeff King <peff@peff•net>
Cc: "Дилян Палаузов" <dilyan.palauzov@aegee•org>, git@vger•kernel.org
Subject: Re: git grep -P fatal: pcre_exec failed with error code -8
Date: Mon, 06 Nov 2017 12:50:45 +0100	[thread overview]
Message-ID: <878tfj62y2.fsf@evledraar.booking.com> (raw)
In-Reply-To: <20171106103125.fwtrxv6zycrbihcv@sigill.intra.peff.net>


On Mon, Nov 06 2017, Jeff King jotted:

> On Sun, Nov 05, 2017 at 10:41:17AM +0100, Дилян Палаузов wrote:
>
>> I understand that the PCRE's stack can get exhausted for some files, but in
>> such cases, git grep shall proceed with the other files, and print at the
>> end/stderr for which files the pattern was not applied.  Such behaviour
>> would be more usefull than the current one.
>
> Yes, I had a similar thought. It does feel a little funny for us to
> basically treat an error as "no match" for non-interactive use, but then
> the current behavior works out to be more or less the same (we return an
> error code which most shell scripts would interpret as failure).
>
> IOW, I think something like this is probably the right direction:
>
> diff --git a/grep.c b/grep.c
> index ce6a48e634..2c152e5908 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -427,7 +427,7 @@ static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
>  	}
>
>  	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
> -		die("pcre_exec failed with error code %d", ret);
> +		warning("pcre_exec failed with error code %d", ret);
>  	if (ret > 0) {
>  		ret = 0;
>  		match->rm_so = ovector[0];
>
> but possibly:
>
>   1. It would be nice to report the filename that we couldn't match on.
>      But we don't know it at this level of the code (and it might not be
>      a file at all that we are matching). So probably we'd want to pass
>      the error much further up the call stack. This is tricky as there
>      are multiple regex libraries we can use, and the return value gets
>      normalized to 1/0 for hit/not-hit long before we get as far as
>      something that knows the filename.
>
>      We might need to do something invasive like adding an extra
>      parameter to hold the error message, and passing it through the
>      whole stack.
>
>   2. We should still try to exit with an exit code other than "1" to
>      indicate we hit an error besides "no lines were found".
>
>   3. Other regex libraries might need similar treatment. Probably
>      pcre2match() needs it. It doesn't look like regexec() can ever
>      return an error besides REG_NOMATCH.
>
> -Peff

Some replies to the thread in general, didn't want to spread this out
into different replies.

 * Yes this sucks.

 * Just emitting a warning without an appropriate exit code would suck
   more, would break batch jobs & whatnot that expcept certain results
   from grep.

 * As you point out it would be nice to print out the file name we
   didn't match on, we'd need to pass the grep_source struct down
   further, it goes as far as grep_source_1 but stops there and isn't
   passed to e.g. look_ahead(), which calls patmatch() which calls the
   engine-specific matcher and would need to report the error. We could
   just do this, would slow down things a bit (probably trivally) but we
   could emit better error messages in genreal.

 * You can adjust these limts in PCRE in Git, although it doesn't help
   in this case, you just add (*LIMIT_MATCH=NUM) or
   (*LIMIT_RECURSION=NUM) (or both) to the start of the pattern. See
   pcresyntax(3) or pcre2syntax(3) man pages depending on what version
   you have installed.

 * While regexec() won't return an error its version of dealing with
   this is (at least under glibc) to balloon CPU/memory use until the
   OOMkiller kills git (although not on this particular pattern).

  reply	other threads:[~2017-11-06 11:50 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-05  0:06 git grep -P fatal: pcre_exec failed with error code -8 Дилян Палаузов
2017-11-05  2:16 ` Jeff King
2017-11-05  9:41   ` Дилян Палаузов
2017-11-06 10:31     ` Jeff King
2017-11-06 11:50       ` Ævar Arnfjörð Bjarmason [this message]
2017-11-06 12:24         ` Jeff King
2017-11-06 14:04           ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878tfj62y2.fsf@evledraar.booking.com \
    --to=avarab@gmail$(echo .)com \
    --cc=dilyan.palauzov@aegee$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox