* [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next @ 2026-01-04 10:46 Tsahi Elkayam 2026-01-05 14:59 ` Patrick Steinhardt 0 siblings, 1 reply; 4+ messages in thread From: Tsahi Elkayam @ 2026-01-04 10:46 UTC (permalink / raw) To: git@vger•kernel.org The indexed_table_ref_iter_next() function accesses ref->value.val2 without first checking the ref's value_type. This is undefined behavior when the ref is not of type REFTABLE_REF_VAL2. The correct pattern is already used in filtering_ref_iterator_next() which checks value_type before accessing the appropriate union member. Apply the same pattern here: - Check for REFTABLE_REF_VAL2 before accessing val2 members - Add missing check for REFTABLE_REF_VAL1 to handle single-value refs This was marked with a "/* BUG */" comment indicating the issue was known but not yet fixed. Signed-off-by: Tsahi Elkayam <Tsahi.Elkayam@protonmail•com> --- reftable/iter.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/reftable/iter.c b/reftable/iter.c index 2ecc52b336..2eee65bb1e 100644 --- a/reftable/iter.c +++ b/reftable/iter.c @@ -171,12 +171,15 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec) } continue; } - /* BUG */ - if (!memcmp(it->oid.buf, ref->value.val2.target_value, - it->oid.len) || - !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) { + if (ref->value_type == REFTABLE_REF_VAL2 && + (!memcmp(it->oid.buf, ref->value.val2.target_value, + it->oid.len) || + !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len))) + return 0; + + if (ref->value_type == REFTABLE_REF_VAL1 && + !memcmp(it->oid.buf, ref->value.val1, it->oid.len)) return 0; - } } } -- 2.37.1 (Apple Git-137.1) Sent with Proton Mail secure email. ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next 2026-01-04 10:46 [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next Tsahi Elkayam @ 2026-01-05 14:59 ` Patrick Steinhardt 2026-01-08 16:52 ` [PATCH v2] reftable/iter: fix UB " Tsahi Elkayam 0 siblings, 1 reply; 4+ messages in thread From: Patrick Steinhardt @ 2026-01-05 14:59 UTC (permalink / raw) To: Tsahi Elkayam; +Cc: git@vger•kernel.org On Sun, Jan 04, 2026 at 10:46:40AM +0000, Tsahi Elkayam wrote: > The indexed_table_ref_iter_next() function accesses ref->value.val2 > without first checking the ref's value_type. This is undefined behavior > when the ref is not of type REFTABLE_REF_VAL2. > > The correct pattern is already used in filtering_ref_iterator_next() > which checks value_type before accessing the appropriate union member. > Apply the same pattern here: > > - Check for REFTABLE_REF_VAL2 before accessing val2 members > - Add missing check for REFTABLE_REF_VAL1 to handle single-value refs One missing bit is to explain what this is actually supposed to do. That is, why do we even compare the data? > This was marked with a "/* BUG */" comment indicating the issue was > known but not yet fixed. That's an indicator that something was wrong, true. But it doesn't really say what the bug was. What puzzles me is that if the bug was so easy to fix, then why didn't the original author already do it? Unfortunately, blaming the line points to 46bc0e731a (reftable: read reftable files, 2021-10-07), and that commit doesn't really provide much context either. > diff --git a/reftable/iter.c b/reftable/iter.c > index 2ecc52b336..2eee65bb1e 100644 > --- a/reftable/iter.c > +++ b/reftable/iter.c > @@ -171,12 +171,15 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec) > } > continue; > } > - /* BUG */ > - if (!memcmp(it->oid.buf, ref->value.val2.target_value, > - it->oid.len) || > - !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) { > + if (ref->value_type == REFTABLE_REF_VAL2 && > + (!memcmp(it->oid.buf, ref->value.val2.target_value, > + it->oid.len) || > + !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len))) > + return 0; > + > + if (ref->value_type == REFTABLE_REF_VAL1 && > + !memcmp(it->oid.buf, ref->value.val1, it->oid.len)) > return 0; > - } > } > } So let's take a step back -- what are we even trying to do here? The indexed table is basically a table that provides reverse mappings. Given an object ID, it allows us to quickly look up any reference that points to this object ID. We don't make any use of that feature in Git right now, but historically it was designed to speed up "uploadpack.allowTipSHA1InWant". This setting is a lot less relevant nowadays, as most forges set "uploadpack.allowAnySHA1InWant" to support partial clones. So this interface is somewhat confusingly named, as the term "index" is overloaded: we have the "ref" and "log" indices that enable fast lookup of those record types. But what this here refers to is the "obj" table. Oh, well. The iterator for those objects takes as input the object ID we're searching for. Given that object ID, it is expected to yield only those ref records that reference this object ID, either peeled or unpeeled in case it is a tag. In the `next()` function we essentially have a nested loop: - The outer loop iterates through the "obj" blocks. This gives us the offsets of the ref records that we need to look up and that contain. - The inner loop iterates through the records in the "ref" block that was referenced by the "obj" block. Honestly, the whole logic doesn't really make any sense though, as we never filter by the caller-provided object ID at all! So we still end up churning through all references, which kind of destroys the purpose of this whole "obj" reverse index. And that's also why we have the check: we verify that the object ID of the reference we have looked up matches the caller's query. So the fix you have here is correct: we may end up with a "ref" record that is unpeeled, and in that case it's wrong to treat it as a peeled one. And if we fix that, the result should at least be correct and void of any kind of undefined behaviour. But the whole infrastructure is still quite broken. What we should be doing is to: 1. Seek to the obj record that has the desired object ID prefix. 2. For each obj record starting with the desired object ID prefix: 1. Look up the respective "ref" block indicated by the offset. 2. Iterate through all "ref" records and yield all those whose value or peeled value match. 3. Abort once there are no more obj records matching the given prefix. All of this is naturally outside the scope of this patch series. I'd argue though that we shouldn't just remove the BUG comment, but instead add some TODO comment that explains why the current logic is still very suboptimal. Thanks! Patrick ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] reftable/iter: fix UB in indexed_table_ref_iter_next 2026-01-05 14:59 ` Patrick Steinhardt @ 2026-01-08 16:52 ` Tsahi Elkayam 2026-01-09 14:19 ` Patrick Steinhardt 0 siblings, 1 reply; 4+ messages in thread From: Tsahi Elkayam @ 2026-01-08 16:52 UTC (permalink / raw) To: Patrick Steinhardt; +Cc: git@vger•kernel.org The indexed_table_ref_iter_next() function provides reverse mappings from object IDs to references. It currently accesses ref->value.val2 without checking the reference's value_type, leading to undefined behavior when encountering unpeeled references (REFTABLE_REF_VAL1). While the current "obj" table implementation is suboptimal—it yields all reference records within a block and relies on manual filtering—this manual comparison is necessary to ensure the yielded record actually matches the target OID prefix requested by the caller. Fix the undefined behavior by checking the value_type before performing the memory comparison. Additionally, replace the "/* BUG */" comment with a TODO explaining the current implementation's inefficiency, as suggested by the maintainer. Signed-off-by: Tsahi Elkayam <Tsahi.Elkayam@Protonmail•com> --- reftable/iter.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/reftable/iter.c b/reftable/iter.c index 2ecc52b336..2eee65bb1e 100644 --- a/reftable/iter.c +++ b/reftable/iter.c @@ -171,12 +171,19 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record rec) } continue; } - /* BUG */ - if (!memcmp(it->oid.buf, ref->value.val2.target_value, - it->oid.len) || - !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) { + + /* + * TODO: The current implementation is suboptimal as it yields + * all ref records in the block rather than filtering by the + * OID prefix. This manual comparison is still necessary. + */ + if (ref->value_type == REFTABLE_REF_VAL2 && + (!memcmp(it->oid.buf, ref->value.val2.target_value, + it->oid.len) || + !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len))) + return 0; + + if (ref->value_type == REFTABLE_REF_VAL1 && + !memcmp(it->oid.buf, ref->value.val1, it->oid.len)) return 0; - } } } -- 2.47.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2] reftable/iter: fix UB in indexed_table_ref_iter_next 2026-01-08 16:52 ` [PATCH v2] reftable/iter: fix UB " Tsahi Elkayam @ 2026-01-09 14:19 ` Patrick Steinhardt 0 siblings, 0 replies; 4+ messages in thread From: Patrick Steinhardt @ 2026-01-09 14:19 UTC (permalink / raw) To: Tsahi Elkayam; +Cc: git@vger•kernel.org On Thu, Jan 08, 2026 at 04:52:05PM +0000, Tsahi Elkayam wrote: > The indexed_table_ref_iter_next() function provides reverse mappings from > object IDs to references. It currently accesses ref->value.val2 without > checking the reference's value_type, leading to undefined behavior when > encountering unpeeled references (REFTABLE_REF_VAL1). > > While the current "obj" table implementation is suboptimal—it yields all > reference records within a block and relies on manual filtering—this > manual comparison is necessary to ensure the yielded record actually > matches the target OID prefix requested by the caller. It's correct to yield all ref records of an indexed ref block, as any of its refs may point to the object ID. What's incorrect is that we: - Don't seek to the correct obj index block when creating the iterator. This means that we'll also seek into ref blocks that won't even contain any ref with the desired object ID. - Don't abort iterating over the obj index blocks once we see that its object IDs no longer match. So this needs a bit of rephrasing. Please feel free to copy these two bullet points as-is. > Fix the undefined behavior by checking the value_type before performing > the memory comparison. Additionally, replace the "/* BUG */" comment > with a TODO explaining the current implementation's inefficiency, as > suggested by the maintainer. I wouldn't refer to myself as maintainer, I'm very happy to let Junio have that role :) You can for example simply add a "Helped-by:" trailer that refers to me. > diff --git a/reftable/iter.c b/reftable/iter.c > index 2ecc52b336..2eee65bb1e 100644 > --- a/reftable/iter.c > +++ b/reftable/iter.c > @@ -171,12 +171,19 @@ static int indexed_table_ref_iter_next(void *p, struct reftable_record rec) > } > continue; > } > - /* BUG */ > - if (!memcmp(it->oid.buf, ref->value.val2.target_value, > - it->oid.len) || > - !memcmp(it->oid.buf, ref->value.val2.value, it->oid.len)) { > + > + /* > + * TODO: The current implementation is suboptimal as it yields > + * all ref records in the block rather than filtering by the > + * OID prefix. This manual comparison is still necessary. > + */ And this needs a bit of rephrasing to represent the above. For example: /* * TODO: The current implementation is suboptimal as: * * - We don't seek to the first obj record that matches our OID * prefix. * * - We don't abort iteration once the OID prefix doesn't match * anymore. * * We don't have any users of this interface in-tree, but once we * add any we should probably try to fix this interface. */ Thanks! Patrick ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-01-09 14:19 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-01-04 10:46 [PATCH v2] reftable/iter: fix undefined behavior in indexed_table_ref_iter_next Tsahi Elkayam 2026-01-05 14:59 ` Patrick Steinhardt 2026-01-08 16:52 ` [PATCH v2] reftable/iter: fix UB " Tsahi Elkayam 2026-01-09 14:19 ` Patrick Steinhardt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox