From: Jeff King <peff@peff•net>
To: Patrick Steinhardt <ps@pks•im>
Cc: git@vger•kernel.org,
"brian m. carlson" <sandals@crustytoothpaste•net>,
Junio C Hamano <gitster@pobox•com>
Subject: Re: [PATCH 3/4] packfile: expose function to read object stream for an offset
Date: Mon, 23 Feb 2026 06:07:22 -0500 [thread overview]
Message-ID: <20260223110722.GB215364@coredump.intra.peff.net> (raw)
In-Reply-To: <20260223-pks-fsck-fix-v1-3-c29036832b6e@pks.im>
On Mon, Feb 23, 2026 at 10:50:42AM +0100, Patrick Steinhardt wrote:
> The function `packfile_store_read_object_stream()` takes as input an
> object ID and then constructs a `struct odb_read_stream` from it. In a
> subsequent commit we'll want to create an object stream for a given
> combination of packfile and offset though, which is not something that
> can currently be done.
>
> Extract a new function `packfile_read_object_stream()` that makes this
> functionality available.
Yup, makes sense. There's one part that puzzled me at first (but I
figured out), and one part I'm not quite sure of.
> -int packfile_store_read_object_stream(struct odb_read_stream **out,
> - struct packfile_store *store,
> - const struct object_id *oid)
> +int packfile_read_object_stream(struct odb_read_stream **out,
> + struct packed_git *pack,
> + off_t offset)
> {
> struct odb_packed_read_stream *stream;
> struct pack_window *window = NULL;
> - struct object_info oi = OBJECT_INFO_INIT;
> enum object_type in_pack_type;
> unsigned long size;
>
> - oi.sizep = &size;
> + in_pack_type = unpack_object_header(pack, &window, &offset, &size);
> + unuse_pack(&window);
>
> - if (packfile_store_read_object_info(store, oid, &oi, 0) ||
> - oi.u.packed.type == PACKED_OBJECT_TYPE_REF_DELTA ||
> - oi.u.packed.type == PACKED_OBJECT_TYPE_OFS_DELTA ||
> - repo_settings_get_big_file_threshold(store->source->odb->repo) >= size)
> + if (repo_settings_get_big_file_threshold(pack->repo) >= size)
> return -1;
>
> - in_pack_type = unpack_object_header(oi.u.packed.pack,
> - &window,
> - &oi.u.packed.offset,
> - &size);
> - unuse_pack(&window);
Before we were checking big_file_threshold up front, and now we must
call unpack_object_header() first. But that's because we got the size
for "free" as part of the object_info call that found our pack entry.
Now our caller is responsible for finding the entry. Our wrapper _could_
continue to provide us with the size, but I don't think there is any
efficiency to be gained. Once we have the pack/offset pair, both code
paths will call unpack_object_header() to find it cheaply. And the new
code is even a little more efficient.
But what about the checks for deltas? We've dropped them completely. I
think that's OK, though, because later we have:
> switch (in_pack_type) {
> default:
> return -1; /* we do not do deltas for now */
So they were somewhat redundant in the first place, and just avoided
calling unpack_object_header() for cases where we knew we could not use
the result (which again, was already filled by packed_object_info() in
the same way).
Good.
> +int packfile_store_read_object_stream(struct odb_read_stream **out,
> + struct packfile_store *store,
> + const struct object_id *oid)
> +{
> + struct pack_entry e;
> +
> + if (!find_pack_entry(store, oid, &e))
> + return -1;
> +
> + return packfile_read_object_stream(out, e.p, e.offset);
> +}
OK. The original read via packfile_store_read_object_info(), which does
a bit more work. It called packed_object_info() and if necessary would
trigger mark_bad_packed_object(). But now that we are leaving it to
packfile_read_object_stream() to look at the header, we don't need to
load any object info, and we have no error code to check.
It does make me wonder, though, if we are missing out on marking bad
objects here. The idea is that we'd usually do something like:
1. some code wants to access $OID
2. we find $OID in pack $P
3. that turns out to be broken for some reason, so we mark it as bad
4. we try again, skipping $P and finding it in some other pack
But now I wonder if code that tries to stream will skip step 3, and then
in step 4 we'll find the same broken $P over and over.
But I suspect if that is possible, it was already true. We were only
asking for the type and size, so any content-level corruption wouldn't
be caught here and we'd have the same issue. I think the right thing is
probably for the streaming code to know about the pack/oid pair it's
trying to read, and to mark it as bad if it hits an error.
So your patch here might be making the problem a tiny bit worse, but not
in a material way. I think we can ignore it for now.
-Peff
next prev parent reply other threads:[~2026-02-23 11:07 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 9:50 [PATCH 0/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23 9:50 ` [PATCH 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 11:13 ` Jeff King
2026-02-23 12:20 ` Patrick Steinhardt
2026-02-23 14:01 ` Eric Sunshine
2026-02-23 9:50 ` [PATCH 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 10:49 ` Jeff King
2026-02-23 12:21 ` Patrick Steinhardt
2026-02-23 12:59 ` Jeff King
2026-02-23 9:50 ` [PATCH 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 11:07 ` Jeff King [this message]
2026-02-23 12:21 ` Patrick Steinhardt
2026-02-23 13:12 ` Jeff King
2026-02-23 15:59 ` Patrick Steinhardt
2026-02-23 9:50 ` [PATCH 4/4] pack-check: fix verification of large objects Patrick Steinhardt
2026-02-23 11:11 ` Jeff King
2026-02-23 11:30 ` Patrick Steinhardt
2026-02-23 12:58 ` Jeff King
2026-02-23 15:48 ` Patrick Steinhardt
2026-02-23 20:35 ` Junio C Hamano
2026-02-24 6:26 ` Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 0/4] " Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 1/4] t/helper: improve "genrandom" test helper Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 2/4] object-file: adapt `stream_object_signature()` to take a stream Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 3/4] packfile: expose function to read object stream for an offset Patrick Steinhardt
2026-02-23 16:00 ` [PATCH v2 4/4] pack-check: fix verification of large objects Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260223110722.GB215364@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=ps@pks$(echo .)im \
--cc=sandals@crustytoothpaste$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox