public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Adrian Ratiu <adrian.ratiu@collabora•com>
To: git@vger•kernel.org
Cc: Junio C Hamano <gitster@pobox•com>,
	Patrick Steinhardt <ps@pks•im>,
	Emily Shaffer <emilyshaffer@google•com>,
	Adrian Ratiu <adrian.ratiu@collabora•com>
Subject: [PATCH v2] ws: add new tab-between-non-ws check
Date: Wed,  7 Jan 2026 03:30:51 +0200	[thread overview]
Message-ID: <20260107013051.312291-1-adrian.ratiu@collabora.com> (raw)

This adds a new check to detect HT in the middle of sentences that
should have been a SP, as suggested by Junio in
https://public-inbox.org/git/xmqqy0mwsedz.fsf@gitster.g/

The check is a bit complex because we want to detect places where
a SP was intended (HT can expand to more than one display column),
so we need to count both the display columns (col) and the string
character columns (i) to determine if a HT looks identical to a SP
or can cause confusion.

Highlighting support for tools like git diff/show/log is added, as
well as git apply --whitespace=fix capability.

The middle section of the line used to be assumed non-highlighted,
which is obviously not true anymore, so we split its logic into a
separate function named emit_middle_section().

The new check is enabled for Documentation/**/*.adoc, where these
kinds of mistakes were seen in practice. It can also be enabled in
other locations where it can be useful, by adding to the relevant
attributes file.

Suggested-by: Junio C Hamano <gitster@pobox•com>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora•com>
---
Changes in v2:
* Highlight the new error in ws_check_emit_1() output generation so
  tools like color git diff can show the errors to the user (Junio)
* Expanded ws_fix_copy to detect and fix this new error (Junio)
* Added more test cases for the above new functionality (Junio)
* Improved commit message (Junio)

Based on the latest master branch.
Pushed to GitHub: https://github.com/10ne1/git/tree/dev/aratiu/whitespace-new-test-v2
Successful CI run: https://github.com/10ne1/git/actions/runs/20766920987
---
 .gitattributes             |   2 +-
 t/t4015-diff-whitespace.sh | 143 +++++++++++++++++++++++++++++
 ws.c                       | 179 ++++++++++++++++++++++++++++++++++---
 ws.h                       |   1 +
 4 files changed, 313 insertions(+), 12 deletions(-)

diff --git a/.gitattributes b/.gitattributes
index 700743c3f5..d3c40a038b 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -7,7 +7,7 @@
 *.py text eol=lf diff=python
 *.bat text eol=crlf
 CODE_OF_CONDUCT.md -whitespace
-/Documentation/**/*.adoc text eol=lf whitespace=trail,space,incomplete
+/Documentation/**/*.adoc text eol=lf whitespace=trail,space,incomplete,tab-between-non-ws
 /command-list.txt text eol=lf
 /GIT-VERSION-GEN text eol=lf
 /mergetools/* text eol=lf
diff --git a/t/t4015-diff-whitespace.sh b/t/t4015-diff-whitespace.sh
index 3c8eb02e4f..f5b6ceeed9 100755
--- a/t/t4015-diff-whitespace.sh
+++ b/t/t4015-diff-whitespace.sh
@@ -2440,4 +2440,147 @@ test_expect_success 'combine --ignore-blank-lines with --function-context 2' '
 	test_cmp expect actual
 '
 
+test_expect_success 'check tab between non-whitespace (tab-between-non-ws: off)' '
+	git config core.whitespace "-tab-between-non-ws" &&
+
+	printf "1234567\tb" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	test_grep "<GREEN>1234567	b<RESET>" actual &&
+
+	# should apply without error because tab-between-non-ws is off
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab between non-whitespace at tab stop (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "1234567\tb" >x &&
+	git add x &&
+	test_must_fail git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	test_grep "<GREEN>1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	! test_grep "<GREEN>1234567	b<RESET>" actual &&
+
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	test_must_fail git apply --whitespace=error patch.diff &&
+	git apply --whitespace=fix patch.diff &&
+	printf "1234567 b" >expected &&
+	test_cmp expected x
+'
+
+test_expect_success 'check tab between non-whitespace not at tab stop (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "a\tb" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>a<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	test_grep "<GREEN>a	b<RESET>" actual &&
+
+	# should apply without error because the input is valid
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab between non-whitespace with tabwidth=4 (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=4" &&
+	printf "123\tb" >x &&
+	git add x &&
+	test_must_fail git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	test_grep "<GREEN>123<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	! test_grep "<GREEN>123	b<RESET>" actual &&
+
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	test_must_fail git apply --whitespace=error patch.diff &&
+	git apply --whitespace=fix patch.diff &&
+	printf "123 b" >expected &&
+	test_cmp expected x
+'
+
+test_expect_success 'check tab between non-whitespace with tabwidth=4 (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=4" &&
+	printf "1234\tb" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>1234<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	test_grep "<GREEN>1234	b<RESET>" actual &&
+
+	# should apply without error because tab is at tab stop
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check multiple tabs with one error (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "a\t1234567\tb" >x &&
+	git add x &&
+	test_must_fail git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	test_grep "<GREEN>a	1234567<RESET><BLUE>	<RESET><GREEN>b<RESET>" actual &&
+	! test_grep "<GREEN>a	1234567	b<RESET>" actual &&
+
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	test_must_fail git apply --whitespace=error patch.diff &&
+	git apply --whitespace=fix patch.diff &&
+	printf "a\t1234567 b" >expected &&
+	test_cmp expected x
+'
+
+test_expect_success 'check tab at beginning of line (tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,tabwidth=8" &&
+	printf "\ta" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<BLUE>	" actual &&
+	test_grep "<GREEN>+<RESET>	<GREEN>a<RESET>" actual &&
+
+	# should apply without error because tab is a valid indentation
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
+test_expect_success 'check tab at end of line(tab-between-non-ws: on)' '
+	git config core.whitespace "tab-between-non-ws,-trailing-space,tabwidth=8" &&
+	printf "a\t" >x &&
+	git add x &&
+	git diff --cached --check &&
+
+	git diff --cached --color >raw &&
+	test_decode_color <raw >actual &&
+	! test_grep "<GREEN>a<RESET><BLUE>	" actual &&
+	test_grep "<GREEN>a	<RESET>" actual &&
+
+	# should apply without error because tab is caught by another check (trailing-space)
+	git diff --cached >patch.diff &&
+	git checkout HEAD -- x &&
+	git apply --whitespace=error patch.diff
+'
+
 test_done
diff --git a/ws.c b/ws.c
index 6cc2466c0c..633bc69418 100644
--- a/ws.c
+++ b/ws.c
@@ -26,6 +26,7 @@ static struct whitespace_rule {
 	{ "blank-at-eol", WS_BLANK_AT_EOL, 0 },
 	{ "blank-at-eof", WS_BLANK_AT_EOF, 0 },
 	{ "tab-in-indent", WS_TAB_IN_INDENT, 0, 1 },
+	{ "tab-between-non-ws", WS_TAB_BETWEEN_NON_WS, 0 },
 	{ "incomplete-line", WS_INCOMPLETE_LINE, 0, 0 },
 };
 
@@ -140,6 +141,11 @@ char *whitespace_error_string(unsigned ws)
 			strbuf_addstr(&err, ", ");
 		strbuf_addstr(&err, "tab in indent");
 	}
+	if (ws & WS_TAB_BETWEEN_NON_WS) {
+		if (err.len)
+			strbuf_addstr(&err, ", ");
+		strbuf_addstr(&err, "tab between non-whitespace characters");
+	}
 	if (ws & WS_INCOMPLETE_LINE) {
 		if (err.len)
 			strbuf_addstr(&err, ", ");
@@ -148,6 +154,80 @@ char *whitespace_error_string(unsigned ws)
 	return strbuf_detach(&err, NULL);
 }
 
+static void emit_literal(const char *line, int len, FILE *stream)
+{
+	fwrite(line, len, 1, stream);
+}
+
+static void highlight_tabs(const char *line, int len,
+			   int written, int trailing_whitespace,
+			   unsigned ws_rule, FILE *stream,
+			   const char *set, const char *reset, const char *ws)
+{
+	int tabwidth = ws_tab_width(ws_rule);
+	int start = written, col = 0;
+
+	if (!tabwidth)
+		BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+	/*
+	 * Calculate the visual column position (col) up to 'written'.
+	 * Tabs expand based on 'tabwidth', so for example the 5th character in a
+	 * string might be at the 12th visual column, if the line contains tabs.
+	 */
+	for (int i = 0; i < written; i++) {
+		if (line[i] == '\t')
+			col += tabwidth - (col % tabwidth);
+		else
+			col++;
+	}
+
+	/* Iterate through the section of the line that needs potential highlighting. */
+	for (int i = written; i < trailing_whitespace; i++) {
+		if (line[i] == '\t') {
+			if (i > 0 && i < len - 1 &&
+			    !isspace(line[i - 1]) && !isspace(line[i + 1]) &&
+			    (col % tabwidth) == (tabwidth - 1)) {
+				/* Print unwritten content before the highlighted tab. */
+				if (start < i)
+					emit_literal(line + start, i - start, stream);
+
+				/* Apply highlight for the tab. */
+				fputs(reset, stream);
+				fputs(ws, stream);
+				fputc('\t', stream);
+				fputs(reset, stream);
+				fputs(set, stream);
+				start = i + 1;
+			}
+			col += tabwidth - (col % tabwidth);
+		} else {
+			col++; /* non-tab character. */
+		}
+	}
+
+	/* Print any remaining content in the current segment after the last highlight. */
+	if (start < trailing_whitespace)
+		emit_literal(line + start, trailing_whitespace - start, stream);
+}
+
+static void emit_middle_section(const char *line, int len,
+			       int written, int trailing_whitespace,
+			       unsigned ws_rule, unsigned result,
+			       FILE *stream, const char *set,
+			       const char *reset, const char *ws)
+{
+	fputs(set, stream);
+
+	if (result & WS_TAB_BETWEEN_NON_WS)
+		highlight_tabs(line, len, written, trailing_whitespace,
+			       ws_rule, stream, set, reset, ws);
+	else
+		emit_literal(line + written, trailing_whitespace - written, stream);
+
+	fputs(reset, stream);
+}
+
 /* If stream is non-NULL, emits the line after checking. */
 static unsigned ws_check_emit_1(const char *line, int len, unsigned ws_rule,
 				FILE *stream, const char *set,
@@ -228,19 +308,41 @@ static unsigned ws_check_emit_1(const char *line, int len, unsigned ws_rule,
 		written = i;
 	}
 
-	if (stream) {
+	if (ws_rule & WS_TAB_BETWEEN_NON_WS) {
 		/*
-		 * Now the rest of the line starts at "written".
-		 * The non-highlighted part ends at "trailing_whitespace".
+		 * A tab surrounded by non-whitespace characters is a typo candidate
+		 * (a space might have been intended). This checks for a tab that
+		 * would be expanded to a single space, which is when it appears at
+		 * a column that is one less than a multiple of the tabwidth.
 		 */
-
-		/* Emit non-highlighted (middle) segment. */
-		if (trailing_whitespace - written > 0) {
-			fputs(set, stream);
-			fwrite(line + written,
-			    trailing_whitespace - written, 1, stream);
-			fputs(reset, stream);
+		int col = 0;
+		int tabwidth = ws_tab_width(ws_rule);
+
+		if (!tabwidth)
+			BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+		for (i = 0; i < len; i++) {
+			if (line[i] == '\t') {
+				if (i > 0 && i < len - 1 &&
+				    !isspace(line[i - 1]) && !isspace(line[i + 1]) &&
+				    (col % tabwidth) == (tabwidth - 1))
+					result |= WS_TAB_BETWEEN_NON_WS;
+				col += tabwidth - (col % tabwidth);
+			} else {
+				col++;
+			}
 		}
+	}
+
+	if (stream) {
+		/*
+		 * The middle section of the line starts at "written" and ends at
+		 * "trailing_whitespace".
+		 */
+		if (trailing_whitespace - written > 0)
+			emit_middle_section(line, len, written, trailing_whitespace,
+					    ws_rule, result,
+					    stream, set, reset, ws);
 
 		/* Highlight errors in trailing whitespace. */
 		if (trailing_whitespace != len) {
@@ -299,6 +401,7 @@ void ws_fix_copy(struct strbuf *dst, const char *src, int len, unsigned ws_rule,
 	int last_tab_in_indent = -1;
 	int last_space_in_indent = -1;
 	int need_fix_leading_space = 0;
+	size_t pre_indent_len = dst->len;
 
 	/*
 	 * Remembering that we need to add '\n' at the end
@@ -401,7 +504,61 @@ void ws_fix_copy(struct strbuf *dst, const char *src, int len, unsigned ws_rule,
 		fixed = 1;
 	}
 
-	strbuf_add(dst, src, len);
+	if (!(ws_rule & WS_TAB_BETWEEN_NON_WS)) {
+		/*
+		 * Middle section does not need fixing, so just append src to
+		 * dst because it already contains the previous ws fixes.
+		 */
+		strbuf_add(dst, src, len);
+	} else {
+		/* Fix middle section HTs which expand to a single display space */
+		const char *indent_part = dst->buf + pre_indent_len;
+		int indent_len = dst->len - pre_indent_len;
+		int tabwidth = ws_tab_width(ws_rule);
+		int col = 0;
+
+		if (!tabwidth)
+			BUG("a known tabwidth is required by WS_TAB_BETWEEN_NON_WS");
+
+		/*
+		 * Compute indentation display column length because it might not
+		 * end at a fixed tab stop position.
+		 */
+		for (i = 0; i < indent_len; i++) {
+			if (indent_part[i] == '\t')
+				col += tabwidth - (col % tabwidth);
+			else
+				col++;
+		}
+
+		/* Go through all chars in src, fix if necessary, and append to dst */
+		for (i = 0; i < len; i++) {
+			char prev_ch = i > 0 ? src[i - 1] :
+				(indent_len > 0 ? indent_part[indent_len - 1] : '\0');
+			char next_ch = i < len - 1 ? src[i + 1] : '\0';
+			bool needs_fixing = prev_ch && next_ch &&
+				!isspace(prev_ch) && !isspace(next_ch) &&
+				(col % tabwidth) == (tabwidth - 1);
+
+			/* non HT chars are added as is */
+			if (src[i] != '\t') {
+				strbuf_addch(dst, src[i]);
+				col++;
+				continue;
+			}
+
+			/* fix HT or add them as is */
+			if (needs_fixing) {
+				strbuf_addch(dst, ' ');
+				col++;
+				fixed = 1;
+			} else {
+				strbuf_addch(dst, '\t');
+				col += tabwidth - (col % tabwidth);
+			}
+		}
+	}
+
 	if (add_cr_to_tail)
 		strbuf_addch(dst, '\r');
 	if (add_nl_to_tail)
diff --git a/ws.h b/ws.h
index 06d5cb73f8..35475fd320 100644
--- a/ws.h
+++ b/ws.h
@@ -16,6 +16,7 @@ struct strbuf;
 #define WS_BLANK_AT_EOF         (1<<10)
 #define WS_TAB_IN_INDENT        (1<<11)
 #define WS_INCOMPLETE_LINE      (1<<12)
+#define WS_TAB_BETWEEN_NON_WS   (1<<13)
 
 #define WS_TRAILING_SPACE       (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
 #define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
-- 
2.51.2


             reply	other threads:[~2026-01-07  1:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-07  1:30 Adrian Ratiu [this message]
2026-01-07  2:12 ` [PATCH v2] ws: add new tab-between-non-ws check Junio C Hamano
2026-01-07 11:34   ` Adrian Ratiu
2026-01-07 17:33 ` Johannes Sixt
2026-01-07 18:11   ` Adrian Ratiu
2026-01-08  8:36     ` Johannes Sixt
2026-01-08  9:01   ` Johannes Sixt
2026-01-09 13:33     ` Adrian Ratiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260107013051.312291-1-adrian.ratiu@collabora.com \
    --to=adrian.ratiu@collabora$(echo .)com \
    --cc=emilyshaffer@google$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox