public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: El_Hoy <eloyesp@gmail•com>
Cc: Thomas Braun <thomas.braun@virtuell-zuhause•de>,
	"brian m. carlson" <sandals@crustytoothpaste•net>,
	Junio C Hamano <gitster@pobox•com>,
	git@vger•kernel.org
Subject: Re: Making git grep ignore binary the default
Date: Tue, 21 Oct 2025 03:27:40 -0400	[thread overview]
Message-ID: <20251021072740.GA259661@coredump.intra.peff.net> (raw)
In-Reply-To: <CAPapNH2UeRoKF9Tm5my59MXCxUQqEp+=4wzod8kYus_FQALwjQ@mail.gmail.com>

On Mon, Oct 20, 2025 at 02:20:06PM -0300, El_Hoy wrote:

> On point 2, as Thomas points, there are many factors that might break
> a script that rely on 'git grep' directly for a dangerous task, this
> makes me think that we could add a `--porcelain` option to `git grep`
> to be used on scripts and be reliable, and it might ignore the config.

Another option here is to provide a way for scripts to override the
ignore mechanism specifically (which would depend on how it is
implemented). For an example, see below.

> On point 3, the configuration could be made with more flexibility in
> mind, making it possible to ignore different files that are not binary
> (for example linguist-generated files). The downside of that approach
> is that it requires more configuration, while a single boolean for
> skipping binaries might be simpler. I'm ok with any approach.

One way to do this would be to provide a default pathspec for git-grep
when one is not defined. Something like:

diff --git a/builtin/grep.c b/builtin/grep.c
index 13841fbf00..7b6a6ba9c6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -42,6 +42,7 @@ static char const * const grep_usage[] = {
 };
 
 static int recurse_submodules;
+static struct strvec default_pathspec = STRVEC_INIT;
 
 static int num_threads;
 
@@ -320,6 +321,15 @@ static int grep_cmd_config(const char *var, const char *value,
 	if (!strcmp(var, "submodule.recurse"))
 		recurse_submodules = git_config_bool(var, value);
 
+	if (!strcmp(var, "grep.defaultpathspec")) {
+		if (!value)
+			return config_error_nonbool(var);
+		else if (*value)
+			strvec_push(&default_pathspec, value);
+		else
+			strvec_clear(&default_pathspec);
+	}
+
 	return st;
 }
 
@@ -1169,7 +1179,7 @@ int cmd_grep(int argc,
 	parse_pathspec(&pathspec, 0,
 		       PATHSPEC_PREFER_CWD |
 		       (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
-		       prefix, argv + i);
+		       prefix, i < argc ? argv + i : default_pathspec.v);
 	pathspec.max_depth = opt.max_depth;
 	pathspec.recursive = 1;
 	pathspec.recurse_submodules = !!recurse_submodules;

Building with that lets you do something like this in git.git:

  $ ./git grep 'added by us:'
  po/bg.po:msgid "added by us:"
  po/ca.po:msgid "added by us:"
  po/de.po:msgid "added by us:"
  po/el.po:msgid "added by us:"
  po/es.po:msgid "added by us:"
  po/fr.po:msgid "added by us:"
  po/ga.po:msgid "added by us:"
  po/id.po:msgid "added by us:"
  po/it.po:msgid "added by us:"
  po/ko.po:msgid "added by us:"
  po/pl.po:msgid "added by us:"
  po/pt_PT.po:msgid "added by us:"
  po/ru.po:msgid "added by us:"
  po/sv.po:msgid "added by us:"
  po/tr.po:msgid "added by us:"
  po/uk.po:msgid "added by us:"
  po/vi.po:msgid "added by us:"
  po/zh_CN.po:msgid "added by us:"
  po/zh_TW.po:msgid "added by us:"
  t/t7060-wtstatus.sh:    added by us:     sub_second.txt
  wt-status.c:            return _("added by us:");

  $ git config grep.defaultPathspec :^po
  $ ./git grep 'added by us:'
  t/t7060-wtstatus.sh:    added by us:     sub_second.txt
  wt-status.c:            return _("added by us:");

And then scripts override it by providing a pathspec (like "." if they
want to see everything, which conveniently also works on old versions of
Git).

It isn't _quite_ the same as an option to ignore certain paths, as it's
a default replacement, and not additive (so as soon as I ask for
everything in "foo/", then "foo/bar" will be included even if I have
"^foo/bar" in my default pathspec). I'm not sure if that is a drawback
or a feature.

There may be other rough edges. It's not something I've thought that
carefully about yet. But it just gives an idea of a possible direction.

Of course you can already do the same thing with an alias right now[1].
You just need to remember to type the alias instead of "grep". That
requires some finger retraining, but it would eliminate any script /
compatibility questions.

-Peff

[1] The alias isn't quite trivial because we want to add our pathspecs
    at the _end_ of the command-line. But I think something like:

      [alias]
      gr = "!f() { exec git grep \"$@\" :^po; }; f"

    works.

  reply	other threads:[~2025-10-21  7:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-17 15:00 Making git grep ignore binary the default El_Hoy
2025-10-17 21:29 ` Junio C Hamano
2025-10-17 23:29   ` Thomas Braun
2025-10-18  0:52     ` brian m. carlson
2025-10-18 14:16       ` rsbecker
2025-10-20 15:24       ` Thomas Braun
2025-10-20 17:20         ` El_Hoy
2025-10-21  7:27           ` Jeff King [this message]
2025-10-18 10:22   ` Jeff King
2025-10-18 16:01     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251021072740.GA259661@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=eloyesp@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=sandals@crustytoothpaste$(echo .)net \
    --cc=thomas.braun@virtuell-zuhause$(echo .)de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox