From: Jeff King <peff@peff•net>
To: El_Hoy <eloyesp@gmail•com>
Cc: Thomas Braun <thomas.braun@virtuell-zuhause•de>,
"brian m. carlson" <sandals@crustytoothpaste•net>,
Junio C Hamano <gitster@pobox•com>,
git@vger•kernel.org
Subject: Re: Making git grep ignore binary the default
Date: Tue, 21 Oct 2025 03:27:40 -0400 [thread overview]
Message-ID: <20251021072740.GA259661@coredump.intra.peff.net> (raw)
In-Reply-To: <CAPapNH2UeRoKF9Tm5my59MXCxUQqEp+=4wzod8kYus_FQALwjQ@mail.gmail.com>
On Mon, Oct 20, 2025 at 02:20:06PM -0300, El_Hoy wrote:
> On point 2, as Thomas points, there are many factors that might break
> a script that rely on 'git grep' directly for a dangerous task, this
> makes me think that we could add a `--porcelain` option to `git grep`
> to be used on scripts and be reliable, and it might ignore the config.
Another option here is to provide a way for scripts to override the
ignore mechanism specifically (which would depend on how it is
implemented). For an example, see below.
> On point 3, the configuration could be made with more flexibility in
> mind, making it possible to ignore different files that are not binary
> (for example linguist-generated files). The downside of that approach
> is that it requires more configuration, while a single boolean for
> skipping binaries might be simpler. I'm ok with any approach.
One way to do this would be to provide a default pathspec for git-grep
when one is not defined. Something like:
diff --git a/builtin/grep.c b/builtin/grep.c
index 13841fbf00..7b6a6ba9c6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -42,6 +42,7 @@ static char const * const grep_usage[] = {
};
static int recurse_submodules;
+static struct strvec default_pathspec = STRVEC_INIT;
static int num_threads;
@@ -320,6 +321,15 @@ static int grep_cmd_config(const char *var, const char *value,
if (!strcmp(var, "submodule.recurse"))
recurse_submodules = git_config_bool(var, value);
+ if (!strcmp(var, "grep.defaultpathspec")) {
+ if (!value)
+ return config_error_nonbool(var);
+ else if (*value)
+ strvec_push(&default_pathspec, value);
+ else
+ strvec_clear(&default_pathspec);
+ }
+
return st;
}
@@ -1169,7 +1179,7 @@ int cmd_grep(int argc,
parse_pathspec(&pathspec, 0,
PATHSPEC_PREFER_CWD |
(opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
- prefix, argv + i);
+ prefix, i < argc ? argv + i : default_pathspec.v);
pathspec.max_depth = opt.max_depth;
pathspec.recursive = 1;
pathspec.recurse_submodules = !!recurse_submodules;
Building with that lets you do something like this in git.git:
$ ./git grep 'added by us:'
po/bg.po:msgid "added by us:"
po/ca.po:msgid "added by us:"
po/de.po:msgid "added by us:"
po/el.po:msgid "added by us:"
po/es.po:msgid "added by us:"
po/fr.po:msgid "added by us:"
po/ga.po:msgid "added by us:"
po/id.po:msgid "added by us:"
po/it.po:msgid "added by us:"
po/ko.po:msgid "added by us:"
po/pl.po:msgid "added by us:"
po/pt_PT.po:msgid "added by us:"
po/ru.po:msgid "added by us:"
po/sv.po:msgid "added by us:"
po/tr.po:msgid "added by us:"
po/uk.po:msgid "added by us:"
po/vi.po:msgid "added by us:"
po/zh_CN.po:msgid "added by us:"
po/zh_TW.po:msgid "added by us:"
t/t7060-wtstatus.sh: added by us: sub_second.txt
wt-status.c: return _("added by us:");
$ git config grep.defaultPathspec :^po
$ ./git grep 'added by us:'
t/t7060-wtstatus.sh: added by us: sub_second.txt
wt-status.c: return _("added by us:");
And then scripts override it by providing a pathspec (like "." if they
want to see everything, which conveniently also works on old versions of
Git).
It isn't _quite_ the same as an option to ignore certain paths, as it's
a default replacement, and not additive (so as soon as I ask for
everything in "foo/", then "foo/bar" will be included even if I have
"^foo/bar" in my default pathspec). I'm not sure if that is a drawback
or a feature.
There may be other rough edges. It's not something I've thought that
carefully about yet. But it just gives an idea of a possible direction.
Of course you can already do the same thing with an alias right now[1].
You just need to remember to type the alias instead of "grep". That
requires some finger retraining, but it would eliminate any script /
compatibility questions.
-Peff
[1] The alias isn't quite trivial because we want to add our pathspecs
at the _end_ of the command-line. But I think something like:
[alias]
gr = "!f() { exec git grep \"$@\" :^po; }; f"
works.
next prev parent reply other threads:[~2025-10-21 7:27 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-17 15:00 Making git grep ignore binary the default El_Hoy
2025-10-17 21:29 ` Junio C Hamano
2025-10-17 23:29 ` Thomas Braun
2025-10-18 0:52 ` brian m. carlson
2025-10-18 14:16 ` rsbecker
2025-10-20 15:24 ` Thomas Braun
2025-10-20 17:20 ` El_Hoy
2025-10-21 7:27 ` Jeff King [this message]
2025-10-18 10:22 ` Jeff King
2025-10-18 16:01 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251021072740.GA259661@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=eloyesp@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=sandals@crustytoothpaste$(echo .)net \
--cc=thomas.braun@virtuell-zuhause$(echo .)de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox