From: LorenzoPegorari <lorenzo.pegorari2002@gmail•com>
To: git@vger•kernel.org
Cc: Junio C Hamano <gitster@pobox•com>, Jeff King <peff@peff•net>,
Toon Claes <toon@iotcl•com>, Justin Tobler <jltobler@gmail•com>,
Niels Glodny <n.glodny@campus•lmu.de>,
Patrick Steinhardt <ps@pks•im>
Subject: [GSoC PATCH v2 1/2] diff: improve scaling of filenames in diffstat to handle UTF-8 chars
Date: Fri, 16 Jan 2026 01:05:03 +0100 [thread overview]
Message-ID: <abeb8d3439de6569fd73617de580fa510e19466b.1768520441.git.lorenzo.pegorari2002@gmail.com> (raw)
In-Reply-To: <cover.1768520441.git.lorenzo.pegorari2002@gmail.com>
The `show_stats()` function tries to scale the filenames in the diffstat to
ensure they don't exceed the given `name-width`. It does so by calculating
the "display width" of the characters to be dropped, but then advances the
filename pointer by that number of bytes.
However, the "display width" of a character is not always equal to its byte
count. The result is that sometimes, when displaying UTF-8 characters,
filenames exceed the given `name-width`, and frequently the bytes of the
UTF-8 characters are truncated.
The following is an example of the issue, where the 2 files are "HelloHi" and
"Hello你好", and `name-width=6`:
...oHi | 0
...<BD><A0>好 | 0
Make the filename pointer move by the actual number of bytes of the
characters to drop from the filename, rather than their display width, using
the `utf8_width()` function.
Force `len` to not be less than 0 (this happens if the given `name-width` is
2 or less), otherwise an infinite loop is entered.
Signed-off-by: LorenzoPegorari <lorenzo.pegorari2002@gmail•com>
---
diff.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/diff.c b/diff.c
index a68ddd2168..452fc69775 100644
--- a/diff.c
+++ b/diff.c
@@ -2859,17 +2859,12 @@ static void show_stats(struct diffstat_t *data, struct diff_options *options)
char *slash;
prefix = "...";
len -= 3;
- /*
- * NEEDSWORK: (name_len - len) counts the display
- * width, which would be shorter than the byte
- * length of the corresponding substring.
- * Advancing "name" by that number of bytes does
- * *NOT* skip over that many columns, so it is
- * very likely that chomping the pathname at the
- * slash we will find starting from "name" will
- * leave the resulting string still too long.
- */
- name += name_len - len;
+ if (len < 0)
+ len = 0;
+
+ while (name_len > len)
+ name_len -= utf8_width((const char**)&name, NULL);
+
slash = strchr(name, '/');
if (slash)
name = slash;
--
2.43.0
next prev parent reply other threads:[~2026-01-16 0:05 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-14 22:27 [GSoC PATCH 1/1] diff: improve scaling of filenames in diffstat to handle UTF-8 chars LorenzoPegorari
2026-01-14 22:50 ` Junio C Hamano
2026-01-16 0:00 ` Lorenzo Pegorari
2026-01-16 0:04 ` [GSoC PATCH v2 0/2] " LorenzoPegorari
2026-01-16 0:05 ` LorenzoPegorari [this message]
2026-01-16 0:05 ` [GSoC PATCH v2 2/2] t4073: add test for diffstat paths length when containing " LorenzoPegorari
2026-01-17 17:52 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abeb8d3439de6569fd73617de580fa510e19466b.1768520441.git.lorenzo.pegorari2002@gmail.com \
--to=lorenzo.pegorari2002@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=jltobler@gmail$(echo .)com \
--cc=n.glodny@campus$(echo .)lmu.de \
--cc=peff@peff$(echo .)net \
--cc=ps@pks$(echo .)im \
--cc=toon@iotcl$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox