From: Adrian Ratiu <adrian.ratiu@collabora•com>
To: git@vger•kernel.org
Cc: Junio C Hamano <gitster@pobox•com>,
Patrick Steinhardt <ps@pks•im>,
Emily Shaffer <emilyshaffer@google•com>,
Adrian Ratiu <adrian.ratiu@collabora•com>
Subject: [PATCH v2] ws: add new tab-between-non-ws check
Date: Wed, 7 Jan 2026 03:30:51 +0200 [thread overview]
Message-ID: <20260107013051.312291-1-adrian.ratiu@collabora.com> (raw)
This adds a new check to detect HT in the middle of sentences that
should have been a SP, as suggested by Junio in
https://public-inbox.org/git/xmqqy0mwsedz.fsf@gitster.g/
The check is a bit complex because we want to detect places where
a SP was intended (HT can expand to more than one display column),
so we need to count both the display columns (col) and the string
character columns (i) to determine if a HT looks identical to a SP
or can cause confusion.
Highlighting support for tools like git diff/show/log is added, as
well as git apply --whitespace=fix capability.
The middle section of the line used to be assumed non-highlighted,
which is obviously not true anymore, so we split its logic into a
separate function named emit_middle_section().
The new check is enabled for Documentation/**/*.adoc, where these
kinds of mistakes were seen in practice. It can also be enabled in
other locations where it can be useful, by adding to the relevant
attributes file.
Suggested-by: Junio C Hamano <gitster@pobox•com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora•com>
---
Changes in v2:
* Highlight the new error in ws_check_emit_1() output generation so
tools like color git diff can show the errors to the user (Junio)
* Expanded ws_fix_copy to detect and fix this new error (Junio)
* Added more test cases for the above new functionality (Junio)
* Improved commit message (Junio)
Based on the latest master branch.
Pushed to GitHub: https://github.com/10ne1/git/tree/dev/aratiu/whitespace-new-test-v2
Successful CI run: https://github.com/10ne1/git/actions/runs/20766920987
---
.gitattributes | 2 +-
t/t4015-diff-whitespace.sh | 143 +++++++++++++++++++++++++++++
ws.c | 179 ++++++++++++++++++++++++++++++++++---
ws.h | 1 +
4 files changed, 313 insertions(+), 12 deletions(-)
diff --git a/.gitattributes b/.gitattributes
index 700743c3f5..d3c40a038b 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -7,7 +7,7 @@
*.py text eol=lf diff=python
*.bat text eol=crlf
CODE_OF_CONDUCT.md -whitespace
-/Documentation/**/*.adoc text eol=lf whitespace=trail,space,incomplete
+/Documentation/**/*.adoc text eol=lf whitespace=trail,space,incomplete,tab-between-non-ws
/command-list.txt text eol=lf
/GIT-VERSION-GEN text eol=lf
/mergetools/* text eol=lf
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 3c8eb02e4f..f5b6ceeed9 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -2440,4 +2440,147 @@ test_expect_success 'combine --ignore-blank-lines with --function-context 2' '
test_cmp expect actual
'
+test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
+ git config core.whitespace "-tab-between-non-ws" &&
+
+ printf "1234567\tb" >x &&
+ git add x &&
+ git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ ! test_grep "<GREEN>1234567<RESET><BLUE> <RESET><GREEN>b<RESET>" actual &&
+ test_grep "<GREEN>1234567 b<RESET>" actual &&
+
+ # should apply without error because tab-between-non-ws is off
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab between non-whitespace at tab stop (tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+ printf "1234567\tb" >x &&
+ git add x &&
+ test_must_fail git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ test_grep "<GREEN>1234567<RESET><BLUE> <RESET><GREEN>b<RESET>" actual &&
+ ! test_grep "<GREEN>1234567 b<RESET>" actual &&
+
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ test_must_fail git apply --whitespace=error patch.diff &&
+ git apply --whitespace=fix patch.diff &&
+ printf "1234567 b" >expected &&
+ test_cmp expected x
+'
+
+test_expect_success 'check tab between non-whitespace not at tab stop (tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+ printf "a\tb" >x &&
+ git add x &&
+ git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ ! test_grep "<GREEN>a<RESET><BLUE> <RESET><GREEN>b<RESET>" actual &&
+ test_grep "<GREEN>a b<RESET>" actual &&
+
+ # should apply without error because the input is valid
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab between non-whitespace with tabwidth=4 (tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,tabwidth=4" &&
+ printf "123\tb" >x &&
+ git add x &&
+ test_must_fail git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ test_grep "<GREEN>123<RESET><BLUE> <RESET><GREEN>b<RESET>" actual &&
+ ! test_grep "<GREEN>123 b<RESET>" actual &&
+
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ test_must_fail git apply --whitespace=error patch.diff &&
+ git apply --whitespace=fix patch.diff &&
+ printf "123 b" >expected &&
+ test_cmp expected x
+'
+
+test_expect_success 'check tab between non-whitespace with tabwidth=4 (tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,tabwidth=4" &&
+ printf "1234\tb" >x &&
+ git add x &&
+ git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ ! test_grep "<GREEN>1234<RESET><BLUE> <RESET><GREEN>b<RESET>" actual &&
+ test_grep "<GREEN>1234 b<RESET>" actual &&
+
+ # should apply without error because tab is at tab stop
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check multiple tabs with one error (tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+ printf "a\t1234567\tb" >x &&
+ git add x &&
+ test_must_fail git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ test_grep "<GREEN>a 1234567<RESET><BLUE> <RESET><GREEN>b<RESET>" actual &&
+ ! test_grep "<GREEN>a 1234567 b<RESET>" actual &&
+
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ test_must_fail git apply --whitespace=error patch.diff &&
+ git apply --whitespace=fix patch.diff &&
+ printf "a\t1234567 b" >expected &&
+ test_cmp expected x
+'
+
+test_expect_success 'check tab at beginning of line (tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+ printf "\ta" >x &&
+ git add x &&
+ git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ ! test_grep "<BLUE> " actual &&
+ test_grep "<GREEN>+<RESET> <GREEN>a<RESET>" actual &&
+
+ # should apply without error because tab is a valid indentation
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab at end of line(tab-between-non-ws: on)' '
+ git config core.whitespace "tab-between-non-ws,-trailing-space,tabwidth=8" &&
+ printf "a\t" >x &&
+ git add x &&
+ git diff --cached --check &&
+
+ git diff --cached --color >raw &&
+ test_decode_color <raw >actual &&
+ ! test_grep "<GREEN>a<RESET><BLUE> " actual &&
+ test_grep "<GREEN>a <RESET>" actual &&
+
+ # should apply without error because tab is caught by another check (trailing-space)
+ git diff --cached >patch.diff &&
+ git checkout HEAD -- x &&
+ git apply --whitespace=error patch.diff
+'
+
test_done
diff --git a/ws.c b/ws.c
index 6cc2466c0c..633bc69418 100644
--- a/ws.c
+++ b/ws.c
@@ -26,6 +26,7 @@ static struct whitespace_rule {
{ "blank-at-eol", WS_BLANK_AT_EOL, 0 },
{ "blank-at-eof", WS_BLANK_AT_EOF, 0 },
{ "tab-in-indent", WS_TAB_IN_INDENT, 0, 1 },
+ { "tab-between-non-ws", WS_TAB_BETWEEN_NON_WS, 0 },
{ "incomplete-line", WS_INCOMPLETE_LINE, 0, 0 },
};
@@ -140,6 +141,11 @@ char *whitespace_error_string(unsigned ws)
strbuf_addstr(&err, ", ");
strbuf_addstr(&err, "tab in indent");
}
+ if (ws & WS_TAB_BETWEEN_NON_WS) {
+ if (err.len)
+ strbuf_addstr(&err, ", ");
+ strbuf_addstr(&err, "tab between non-whitespace characters");
+ }
if (ws & WS_INCOMPLETE_LINE) {
if (err.len)
strbuf_addstr(&err, ", ");
@@ -148,6 +154,80 @@ char *whitespace_error_string(unsigned ws)
return strbuf_detach(&err, NULL);
}
+static void emit_literal(const char *line, int len, FILE *stream)
+{
+ fwrite(line, len, 1, stream);
+}
+
+static void highlight_tabs(const char *line, int len,
+ int written, int trailing_whitespace,
+ unsigned ws_rule, FILE *stream,
+ const char *set, const char *reset, const char *ws)
+{
+ int tabwidth = ws_tab_width(ws_rule);
+ int start = written, col = 0;
+
+ if (!tabwidth)
+ BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+ /*
+ * Calculate the visual column position (col) up to 'written'.
+ * Tabs expand based on 'tabwidth', so for example the 5th character in a
+ * string might be at the 12th visual column, if the line contains tabs.
+ */
+ for (int i = 0; i < written; i++) {
+ if (line[i] == '\t')
+ col += tabwidth - (col % tabwidth);
+ else
+ col++;
+ }
+
+ /* Iterate through the section of the line that needs potential highlighting. */
+ for (int i = written; i < trailing_whitespace; i++) {
+ if (line[i] == '\t') {
+ if (i > 0 && i < len - 1 &&
+ !isspace(line[i - 1]) && !isspace(line[i + 1]) &&
+ (col % tabwidth) == (tabwidth - 1)) {
+ /* Print unwritten content before the highlighted tab. */
+ if (start < i)
+ emit_literal(line + start, i - start, stream);
+
+ /* Apply highlight for the tab. */
+ fputs(reset, stream);
+ fputs(ws, stream);
+ fputc('\t', stream);
+ fputs(reset, stream);
+ fputs(set, stream);
+ start = i + 1;
+ }
+ col += tabwidth - (col % tabwidth);
+ } else {
+ col++; /* non-tab character. */
+ }
+ }
+
+ /* Print any remaining content in the current segment after the last highlight. */
+ if (start < trailing_whitespace)
+ emit_literal(line + start, trailing_whitespace - start, stream);
+}
+
+static void emit_middle_section(const char *line, int len,
+ int written, int trailing_whitespace,
+ unsigned ws_rule, unsigned result,
+ FILE *stream, const char *set,
+ const char *reset, const char *ws)
+{
+ fputs(set, stream);
+
+ if (result & WS_TAB_BETWEEN_NON_WS)
+ highlight_tabs(line, len, written, trailing_whitespace,
+ ws_rule, stream, set, reset, ws);
+ else
+ emit_literal(line + written, trailing_whitespace - written, stream);
+
+ fputs(reset, stream);
+}
+
/* If stream is non-NULL, emits the line after checking. */
static unsigned ws_check_emit_1(const char *line, int len, unsigned ws_rule,
FILE *stream, const char *set,
@@ -228,19 +308,41 @@ static unsigned ws_check_emit_1(const char *line, int len, unsigned ws_rule,
written = i;
}
- if (stream) {
+ if (ws_rule & WS_TAB_BETWEEN_NON_WS) {
/*
- * Now the rest of the line starts at "written".
- * The non-highlighted part ends at "trailing_whitespace".
+ * A tab surrounded by non-whitespace characters is a typo candidate
+ * (a space might have been intended). This checks for a tab that
+ * would be expanded to a single space, which is when it appears at
+ * a column that is one less than a multiple of the tabwidth.
*/
-
- /* Emit non-highlighted (middle) segment. */
- if (trailing_whitespace - written > 0) {
- fputs(set, stream);
- fwrite(line + written,
- trailing_whitespace - written, 1, stream);
- fputs(reset, stream);
+ int col = 0;
+ int tabwidth = ws_tab_width(ws_rule);
+
+ if (!tabwidth)
+ BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+ for (i = 0; i < len; i++) {
+ if (line[i] == '\t') {
+ if (i > 0 && i < len - 1 &&
+ !isspace(line[i - 1]) && !isspace(line[i + 1]) &&
+ (col % tabwidth) == (tabwidth - 1))
+ result |= WS_TAB_BETWEEN_NON_WS;
+ col += tabwidth - (col % tabwidth);
+ } else {
+ col++;
+ }
}
+ }
+
+ if (stream) {
+ /*
+ * The middle section of the line starts at "written" and ends at
+ * "trailing_whitespace".
+ */
+ if (trailing_whitespace - written > 0)
+ emit_middle_section(line, len, written, trailing_whitespace,
+ ws_rule, result,
+ stream, set, reset, ws);
/* Highlight errors in trailing whitespace. */
if (trailing_whitespace != len) {
@@ -299,6 +401,7 @@ void ws_fix_copy(struct strbuf *dst, const char *src, int len, unsigned ws_rule,
int last_tab_in_indent = -1;
int last_space_in_indent = -1;
int need_fix_leading_space = 0;
+ size_t pre_indent_len = dst->len;
/*
* Remembering that we need to add '\n' at the end
@@ -401,7 +504,61 @@ void ws_fix_copy(struct strbuf *dst, const char *src, int len, unsigned ws_rule,
fixed = 1;
}
- strbuf_add(dst, src, len);
+ if (!(ws_rule & WS_TAB_BETWEEN_NON_WS)) {
+ /*
+ * Middle section does not need fixing, so just append src to
+ * dst because it already contains the previous ws fixes.
+ */
+ strbuf_add(dst, src, len);
+ } else {
+ /* Fix middle section HTs which expand to a single display space */
+ const char *indent_part = dst->buf + pre_indent_len;
+ int indent_len = dst->len - pre_indent_len;
+ int tabwidth = ws_tab_width(ws_rule);
+ int col = 0;
+
+ if (!tabwidth)
+ BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+ /*
+ * Compute indentation display column length because it might not
+ * end at a fixed tab stop position.
+ */
+ for (i = 0; i < indent_len; i++) {
+ if (indent_part[i] == '\t')
+ col += tabwidth - (col % tabwidth);
+ else
+ col++;
+ }
+
+ /* Go through all chars in src, fix if necessary, and append to dst */
+ for (i = 0; i < len; i++) {
+ char prev_ch = i > 0 ? src[i - 1] :
+ (indent_len > 0 ? indent_part[indent_len - 1] : '\0');
+ char next_ch = i < len - 1 ? src[i + 1] : '\0';
+ bool needs_fixing = prev_ch && next_ch &&
+ !isspace(prev_ch) && !isspace(next_ch) &&
+ (col % tabwidth) == (tabwidth - 1);
+
+ /* non HT chars are added as is */
+ if (src[i] != '\t') {
+ strbuf_addch(dst, src[i]);
+ col++;
+ continue;
+ }
+
+ /* fix HT or add them as is */
+ if (needs_fixing) {
+ strbuf_addch(dst, ' ');
+ col++;
+ fixed = 1;
+ } else {
+ strbuf_addch(dst, '\t');
+ col += tabwidth - (col % tabwidth);
+ }
+ }
+ }
+
if (add_cr_to_tail)
strbuf_addch(dst, '\r');
if (add_nl_to_tail)
diff --git a/ws.h b/ws.h
index 06d5cb73f8..35475fd320 100644
--- a/ws.h
+++ b/ws.h
@@ -16,6 +16,7 @@ struct strbuf;
#define WS_BLANK_AT_EOF (1<<10)
#define WS_TAB_IN_INDENT (1<<11)
#define WS_INCOMPLETE_LINE (1<<12)
+#define WS_TAB_BETWEEN_NON_WS (1<<13)
#define WS_TRAILING_SPACE (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
#define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
--
2.51.2
next reply other threads:[~2026-01-07 1:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-07 1:30 Adrian Ratiu [this message]
2026-01-07 2:12 ` [PATCH v2] ws: add new tab-between-non-ws check Junio C Hamano
2026-01-07 11:34 ` Adrian Ratiu
2026-01-07 17:33 ` Johannes Sixt
2026-01-07 18:11 ` Adrian Ratiu
2026-01-08 8:36 ` Johannes Sixt
2026-01-08 9:01 ` Johannes Sixt
2026-01-09 13:33 ` Adrian Ratiu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260107013051.312291-1-adrian.ratiu@collabora.com \
--to=adrian.ratiu@collabora$(echo .)com \
--cc=emilyshaffer@google$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=ps@pks$(echo .)im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox