* [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next
@ 2026-01-04 10:46 Tsahi Elkayam
2026-01-05 14:59 ` Patrick Steinhardt
0 siblings, 1 reply; 4+ messages in thread
From: Tsahi Elkayam @ 2026-01-04 10:46 UTC (permalink / raw)
To: git@vger•kernel.org
The indexed_table_ref_iter_next() function accesses ref->value.val2
without first checking the ref's value_type. This is undefined behavior
when the ref is not of type REFTABLE_REF_VAL2.
The correct pattern is already used in filtering_ref_iterator_next()
which checks value_type before accessing the appropriate union member.
Apply the same pattern here:
- Check for REFTABLE_REF_VAL2 before accessing val2 members
- Add missing check for REFTABLE_REF_VAL1 to handle single-value refs
This was marked with a "/* BUG */" comment indicating the issue was
known but not yet fixed.
Signed-off-by: Tsahi Elkayam <Tsahi.Elkayam@protonmail•com>
---
reftable/iter.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/reftable/iter.c b/reftable/iter.c
index 2ecc52b336..2eee65bb1e 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -171,12 +171,15 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
}
continue;
}
- /* BUG */
- if (!memcmp(it->oid.buf, ref->value.val2.target_value,
- it->oid.len) ||
- !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) {
+ if (ref->value_type == REFTABLE_REF_VAL2 &&
+ (!memcmp(it->oid.buf, ref->value.val2.target_value,
+ it->oid.len) ||
+ !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)))
+ return 0;
+
+ if (ref->value_type == REFTABLE_REF_VAL1 &&
+ !memcmp(it->oid.buf, ref->value.val1, it->oid.len))
return 0;
- }
}
}
--
2.37.1 (Apple Git-137.1)
Sent with Proton Mail secure email.
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next
2026-01-04 10:46 [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next Tsahi Elkayam
@ 2026-01-05 14:59 ` Patrick Steinhardt
2026-01-08 16:52 ` [PATCH v2] reftable/iter: fix UB " Tsahi Elkayam
0 siblings, 1 reply; 4+ messages in thread
From: Patrick Steinhardt @ 2026-01-05 14:59 UTC (permalink / raw)
To: Tsahi Elkayam; +Cc: git@vger•kernel.org
On Sun, Jan 04, 2026 at 10:46:40AM +0000, Tsahi Elkayam wrote:
> The indexed_table_ref_iter_next() function accesses ref->value.val2
> without first checking the ref's value_type. This is undefined behavior
> when the ref is not of type REFTABLE_REF_VAL2.
>
> The correct pattern is already used in filtering_ref_iterator_next()
> which checks value_type before accessing the appropriate union member.
> Apply the same pattern here:
>
> - Check for REFTABLE_REF_VAL2 before accessing val2 members
> - Add missing check for REFTABLE_REF_VAL1 to handle single-value refs
One missing bit is to explain what this is actually supposed to do. That
is, why do we even compare the data?
> This was marked with a "/* BUG */" comment indicating the issue was
> known but not yet fixed.
That's an indicator that something was wrong, true. But it doesn't
really say what the bug was. What puzzles me is that if the bug was so
easy to fix, then why didn't the original author already do it?
Unfortunately, blaming the line points to 46bc0e731a (reftable: read
reftable files, 2021-10-07), and that commit doesn't really provide much
context either.
> diff --git a/reftable/iter.c b/reftable/iter.c
> index 2ecc52b336..2eee65bb1e 100644
> --- a/reftable/iter.c
> +++ b/reftable/iter.c
> @@ -171,12 +171,15 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
> }
> continue;
> }
> - /* BUG */
> - if (!memcmp(it->oid.buf, ref->value.val2.target_value,
> - it->oid.len) ||
> - !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) {
> + if (ref->value_type == REFTABLE_REF_VAL2 &&
> + (!memcmp(it->oid.buf, ref->value.val2.target_value,
> + it->oid.len) ||
> + !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)))
> + return 0;
> +
> + if (ref->value_type == REFTABLE_REF_VAL1 &&
> + !memcmp(it->oid.buf, ref->value.val1, it->oid.len))
> return 0;
> - }
> }
> }
So let's take a step back -- what are we even trying to do here?
The indexed table is basically a table that provides reverse mappings.
Given an object ID, it allows us to quickly look up any reference that
points to this object ID. We don't make any use of that feature in Git
right now, but historically it was designed to speed up
"uploadpack.allowTipSHA1InWant". This setting is a lot less relevant
nowadays, as most forges set "uploadpack.allowAnySHA1InWant" to support
partial clones.
So this interface is somewhat confusingly named, as the term "index" is
overloaded: we have the "ref" and "log" indices that enable fast lookup
of those record types. But what this here refers to is the "obj" table.
Oh, well.
The iterator for those objects takes as input the object ID we're
searching for. Given that object ID, it is expected to yield only those
ref records that reference this object ID, either peeled or unpeeled in
case it is a tag.
In the `next()` function we essentially have a nested loop:
- The outer loop iterates through the "obj" blocks. This gives us the
offsets of the ref records that we need to look up and that contain.
- The inner loop iterates through the records in the "ref" block that
was referenced by the "obj" block.
Honestly, the whole logic doesn't really make any sense though, as we
never filter by the caller-provided object ID at all! So we still end up
churning through all references, which kind of destroys the purpose of
this whole "obj" reverse index. And that's also why we have the check:
we verify that the object ID of the reference we have looked up matches
the caller's query.
So the fix you have here is correct: we may end up with a "ref" record
that is unpeeled, and in that case it's wrong to treat it as a peeled
one. And if we fix that, the result should at least be correct and void
of any kind of undefined behaviour. But the whole infrastructure is
still quite broken. What we should be doing is to:
1. Seek to the obj record that has the desired object ID prefix.
2. For each obj record starting with the desired object ID prefix:
1. Look up the respective "ref" block indicated by the offset.
2. Iterate through all "ref" records and yield all those whose value
or peeled value match.
3. Abort once there are no more obj records matching the given prefix.
All of this is naturally outside the scope of this patch series. I'd
argue though that we shouldn't just remove the BUG comment, but instead
add some TODO comment that explains why the current logic is still very
suboptimal.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] reftable/iter: fix UB in indexed_table_ref_iter_next
2026-01-05 14:59 ` Patrick Steinhardt
@ 2026-01-08 16:52 ` Tsahi Elkayam
2026-01-09 14:19 ` Patrick Steinhardt
0 siblings, 1 reply; 4+ messages in thread
From: Tsahi Elkayam @ 2026-01-08 16:52 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git@vger•kernel.org
The indexed_table_ref_iter_next() function provides reverse mappings from
object IDs to references. It currently accesses ref->value.val2 without
checking the reference's value_type, leading to undefined behavior when
encountering unpeeled references (REFTABLE_REF_VAL1).
While the current "obj" table implementation is suboptimal—it yields all
reference records within a block and relies on manual filtering—this
manual comparison is necessary to ensure the yielded record actually
matches the target OID prefix requested by the caller.
Fix the undefined behavior by checking the value_type before performing
the memory comparison. Additionally, replace the "/* BUG */" comment
with a TODO explaining the current implementation's inefficiency, as
suggested by the maintainer.
Signed-off-by: Tsahi Elkayam <Tsahi.Elkayam@Protonmail•com>
---
reftable/iter.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/reftable/iter.c b/reftable/iter.c
index 2ecc52b336..2eee65bb1e 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -171,12 +171,19 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record rec)
}
continue;
}
- /* BUG */
- if (!memcmp(it->oid.buf, ref->value.val2.target_value,
- it->oid.len) ||
- !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) {
+
+ /*
+ * TODO: The current implementation is suboptimal as it yields
+ * all ref records in the block rather than filtering by the
+ * OID prefix. This manual comparison is still necessary.
+ */
+ if (ref->value_type == REFTABLE_REF_VAL2 &&
+ (!memcmp(it->oid.buf, ref->value.val2.target_value,
+ it->oid.len) ||
+ !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)))
+ return 0;
+
+ if (ref->value_type == REFTABLE_REF_VAL1 &&
+ !memcmp(it->oid.buf, ref->value.val1, it->oid.len))
return 0;
- }
}
}
--
2.47.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] reftable/iter: fix UB in indexed_table_ref_iter_next
2026-01-08 16:52 ` [PATCH v2] reftable/iter: fix UB " Tsahi Elkayam
@ 2026-01-09 14:19 ` Patrick Steinhardt
0 siblings, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-01-09 14:19 UTC (permalink / raw)
To: Tsahi Elkayam; +Cc: git@vger•kernel.org
On Thu, Jan 08, 2026 at 04:52:05PM +0000, Tsahi Elkayam wrote:
> The indexed_table_ref_iter_next() function provides reverse mappings from
> object IDs to references. It currently accesses ref->value.val2 without
> checking the reference's value_type, leading to undefined behavior when
> encountering unpeeled references (REFTABLE_REF_VAL1).
>
> While the current "obj" table implementation is suboptimal—it yields all
> reference records within a block and relies on manual filtering—this
> manual comparison is necessary to ensure the yielded record actually
> matches the target OID prefix requested by the caller.
It's correct to yield all ref records of an indexed ref block, as any of
its refs may point to the object ID. What's incorrect is that we:
- Don't seek to the correct obj index block when creating the
iterator. This means that we'll also seek into ref blocks that won't
even contain any ref with the desired object ID.
- Don't abort iterating over the obj index blocks once we see that its
object IDs no longer match.
So this needs a bit of rephrasing. Please feel free to copy these two
bullet points as-is.
> Fix the undefined behavior by checking the value_type before performing
> the memory comparison. Additionally, replace the "/* BUG */" comment
> with a TODO explaining the current implementation's inefficiency, as
> suggested by the maintainer.
I wouldn't refer to myself as maintainer, I'm very happy to let Junio
have that role :) You can for example simply add a "Helped-by:" trailer
that refers to me.
> diff --git a/reftable/iter.c b/reftable/iter.c
> index 2ecc52b336..2eee65bb1e 100644
> --- a/reftable/iter.c
> +++ b/reftable/iter.c
> @@ -171,12 +171,19 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record rec)
> }
> continue;
> }
> - /* BUG */
> - if (!memcmp(it->oid.buf, ref->value.val2.target_value,
> - it->oid.len) ||
> - !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) {
> +
> + /*
> + * TODO: The current implementation is suboptimal as it yields
> + * all ref records in the block rather than filtering by the
> + * OID prefix. This manual comparison is still necessary.
> + */
And this needs a bit of rephrasing to represent the above. For example:
/*
* TODO: The current implementation is suboptimal as:
*
* - We don't seek to the first obj record that matches our OID
* prefix.
*
* - We don't abort iteration once the OID prefix doesn't match
* anymore.
*
* We don't have any users of this interface in-tree, but once we
* add any we should probably try to fix this interface.
*/
Thanks!
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-09 14:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-04 10:46 [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next Tsahi Elkayam
2026-01-05 14:59 ` Patrick Steinhardt
2026-01-08 16:52 ` [PATCH v2] reftable/iter: fix UB " Tsahi Elkayam
2026-01-09 14:19 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox