public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Junio C Hamano <gitster@pobox•com>
Cc: Wink Saville <wink@saville•com>, Git List <git@vger•kernel.org>
Subject: Re: [BUGREPORT] git diff-tree --cc SEGFAUTs
Date: Fri, 3 Jan 2025 22:32:10 -0500	[thread overview]
Message-ID: <20250104033210.GA892381@coredump.intra.peff.net> (raw)
In-Reply-To: <xmqq4j2fnv8p.fsf@gitster.g>

On Fri, Jan 03, 2025 at 06:55:18PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff•net> writes:
> 
> > ... OTOH it is not really
> > solving the more fundamental problem, which is that p->parent[i].path is
> > only sometimes useful (we do not fill it in if it would just be the same
> > as p->path, so the patch only changes it from uninitialized memory into
> > an empty strbuf).
> >
> > And that is probably not something we want to change, as allocating
> > duplicates of each path may be expensive.
> 
> Nicely said.  I reached the same conclusion after looking at the
> existing code, even though I have to admit that I am not a huge fan
> of the more recent part of combine-diff.c and its data structures.

I poked at this a little bit more, so here are a few tidbits:

  - the patch I showed earlier is not sufficient! There are lots of
    other spots that create combine_diff_path structs but don't bother
    to put anything in the parent paths at all. It works now because
    they also don't set a status that triggers filename_changed(). But
    what I showed earlier was wrong, because it was assuming in the
    cleanup functions that the strbufs were always initialized.

  - there's really no need for a strbuf at all here. It is always
    uninitialized/empty, or contains a direct copy of a path string. So
    a raw pointer with xstrdup() is plenty. And then we can use NULL to
    mean "it was not set".

    Which would Just Work for all those other spots if they bothered to
    zero the memory they allocated, but they don't. So we have to update
    them to set it to NULL anyway. That patch is below.

  - it is not at all clear to me that we need to be allocating at all.
    We always copy a string from the diff_queue. Do our
    combine_diff_path structs persist beyond then? I'm not sure. It is
    probably asking for trouble to just point to them directly without
    copying, as it creates a dependency (that even if it is not needed
    now, is a trap for somebody later). But it would drop some
    allocation/cleanup code, and we could just have p->parent[i].path
    fall back to p->path naturally.

diff --git a/combine-diff.c b/combine-diff.c
index 641bc92dbd..0d9d344c4e 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -66,13 +66,9 @@ static struct combine_diff_path *intersect_paths(
 			oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
 			p->parent[n].mode = q->queue[i]->one->mode;
 			p->parent[n].status = q->queue[i]->status;
-
-			if (combined_all_paths &&
-			    filename_changed(p->parent[n].status)) {
-				strbuf_init(&p->parent[n].path, 0);
-				strbuf_addstr(&p->parent[n].path,
-					      q->queue[i]->one->path);
-			}
+			p->parent[n].path = combined_all_paths &&
+					    filename_changed(p->parent[n].status) ?
+					    xstrdup(q->queue[i]->one->path) : NULL;
 			*tail = p;
 			tail = &p->next;
 		}
@@ -92,9 +88,7 @@ static struct combine_diff_path *intersect_paths(
 			/* p->path not in q->queue[]; drop it */
 			*tail = p->next;
 			for (j = 0; j < num_parent; j++)
-				if (combined_all_paths &&
-				    filename_changed(p->parent[j].status))
-					strbuf_release(&p->parent[j].path);
+				free(p->parent[j].path);
 			free(p);
 			continue;
 		}
@@ -108,10 +102,9 @@ static struct combine_diff_path *intersect_paths(
 		oidcpy(&p->parent[n].oid, &q->queue[i]->one->oid);
 		p->parent[n].mode = q->queue[i]->one->mode;
 		p->parent[n].status = q->queue[i]->status;
-		if (combined_all_paths &&
-		    filename_changed(p->parent[n].status))
-			strbuf_addstr(&p->parent[n].path,
-				      q->queue[i]->one->path);
+		p->parent[n].path = combined_all_paths &&
+				    filename_changed(p->parent[n].status) ?
+				    xstrdup(q->queue[i]->one->path) : NULL;
 
 		tail = &p->next;
 		i++;
@@ -996,8 +989,9 @@ static void show_combined_header(struct combine_diff_path *elem,
 
 	if (rev->combined_all_paths) {
 		for (i = 0; i < num_parent; i++) {
-			char *path = filename_changed(elem->parent[i].status)
-				? elem->parent[i].path.buf : elem->path;
+			const char *path = elem->parent[i].path ?
+					   elem->parent[i].path :
+					   elem->path;
 			if (elem->parent[i].status == DIFF_STATUS_ADDED)
 				dump_quoted_path("--- ", "", "/dev/null",
 						 line_prefix, c_meta, c_reset);
@@ -1278,12 +1272,10 @@ static void show_raw_diff(struct combine_diff_path *p, int num_parent, struct re
 
 	for (i = 0; i < num_parent; i++)
 		if (rev->combined_all_paths) {
-			if (filename_changed(p->parent[i].status))
-				write_name_quoted(p->parent[i].path.buf, stdout,
-						  inter_name_termination);
-			else
-				write_name_quoted(p->path, stdout,
-						  inter_name_termination);
+			const char *path = p->parent[i].path ?
+					   p->parent[i].path :
+					   p->path;
+			write_name_quoted(path, stdout, inter_name_termination);
 		}
 	write_name_quoted(p->path, stdout, line_termination);
 }
@@ -1645,9 +1637,7 @@ void diff_tree_combined(const struct object_id *oid,
 		struct combine_diff_path *tmp = paths;
 		paths = paths->next;
 		for (i = 0; i < num_parent; i++)
-			if (rev->combined_all_paths &&
-			    filename_changed(tmp->parent[i].status))
-				strbuf_release(&tmp->parent[i].path);
+			free(tmp->parent[i].path);
 		free(tmp);
 	}
 
diff --git a/diff-lib.c b/diff-lib.c
index c6d3bc4d37..88a5aed736 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -417,9 +417,11 @@ static int show_modified(struct rev_info *revs,
 		memset(p->parent, 0, 2 * sizeof(struct combine_diff_parent));
 		p->parent[0].status = DIFF_STATUS_MODIFIED;
 		p->parent[0].mode = new_entry->ce_mode;
+		p->parent[0].path = NULL;
 		oidcpy(&p->parent[0].oid, &new_entry->oid);
 		p->parent[1].status = DIFF_STATUS_MODIFIED;
 		p->parent[1].mode = old_entry->ce_mode;
+		p->parent[1].path = NULL;
 		oidcpy(&p->parent[1].oid, &old_entry->oid);
 		show_combined_diff(p, 2, revs);
 		free(p);
diff --git a/diff.h b/diff.h
index 6e6007c17b..3157faeabb 100644
--- a/diff.h
+++ b/diff.h
@@ -480,7 +480,7 @@ struct combine_diff_path {
 		char status;
 		unsigned int mode;
 		struct object_id oid;
-		struct strbuf path;
+		char *path;
 	} parent[FLEX_ARRAY];
 };
 #define combine_diff_path_size(n, l) \
diff --git a/tree-diff.c b/tree-diff.c
index d9237ffd9b..57af377c2b 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -272,6 +272,7 @@ static struct combine_diff_path *emit_path(struct combine_diff_path *p,
 			}
 
 			p->parent[i].mode = mode_i;
+			p->parent[i].path = NULL;
 			oidcpy(&p->parent[i].oid, oid_i);
 		}
 

  reply	other threads:[~2025-01-04  3:32 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-03 19:28 [BUGREPORT] git diff-tree --cc SEGFAUTs Wink Saville
2025-01-03 20:46 ` Jeff King
2025-01-03 23:34   ` Wink Saville
2025-01-04  0:31     ` Jeff King
2025-01-04  2:55       ` Junio C Hamano
2025-01-04  3:32         ` Jeff King [this message]
2025-01-04 18:09           ` Wink Saville
2025-01-05 22:13             ` Wink Saville
2025-01-09  8:27           ` [PATCH 0/14] combine-diff cleanups Jeff King
2025-01-09  8:28             ` [PATCH 01/14] run_diff_files(): delay allocation of combine_diff_path Jeff King
2025-01-09 17:57               ` Junio C Hamano
2025-01-09  8:32             ` [PATCH 02/14] combine-diff: add combine_diff_path_new() Jeff King
2025-01-09 18:05               ` Junio C Hamano
2025-01-13 15:40               ` Patrick Steinhardt
2025-01-14  9:29                 ` Jeff King
2025-01-09  8:33             ` [PATCH 03/14] tree-diff: clear parent array in path_appendnew() Jeff King
2025-01-09 18:28               ` Junio C Hamano
2025-01-10 10:54                 ` Jeff King
2025-01-09  8:42             ` [PATCH 04/14] combine-diff: use pointer for parent paths Jeff King
2025-01-09 18:49               ` Junio C Hamano
2025-01-09  8:42             ` [PATCH 05/14] diff: add a comment about combine_diff_path.parent.path Jeff King
2025-01-13 15:40               ` Patrick Steinhardt
2025-01-09  8:44             ` [PATCH 06/14] run_diff_files(): de-mystify the size of combine_diff_path struct Jeff King
2025-01-10 16:40               ` Junio C Hamano
2025-01-09  8:46             ` [PATCH 07/14] tree-diff: drop path_appendnew() alloc optimization Jeff King
2025-01-13 15:40               ` Patrick Steinhardt
2025-01-14 10:30                 ` Jeff King
2025-01-09  8:49             ` [PATCH 08/14] tree-diff: pass whole path string to path_appendnew() Jeff King
2025-01-13 15:40               ` Patrick Steinhardt
2025-01-14  9:26                 ` Jeff King
2025-01-09  8:49             ` [PATCH 09/14] tree-diff: inline path_appendnew() Jeff King
2025-01-11  0:41               ` Junio C Hamano
2025-01-09  8:50             ` [PATCH 10/14] combine-diff: drop public declaration of combine_diff_path_size() Jeff King
2025-01-09  8:51             ` [PATCH 11/14] tree-diff: drop list-tail argument to diff_tree_paths() Jeff King
2025-01-18  0:33               ` Junio C Hamano
2025-01-09  8:53             ` [PATCH 12/14] tree-diff: use the name "tail" to refer to list tail Jeff King
2025-01-09  8:54             ` [PATCH 13/14] tree-diff: simplify emit_path() list management Jeff King
2025-01-09  8:57             ` [PATCH 14/14] tree-diff: make list tail-passing more explicit Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250104033210.GA892381@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=wink@saville$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox