From: Taylor Blau <me@ttaylorr•com>
To: Patrick Steinhardt <ps@pks•im>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
Date: Thu, 10 Jul 2025 19:56:28 -0400 [thread overview]
Message-ID: <aHBTLFvL+eQcw6J0@nand.local> (raw)
In-Reply-To: <20250709-b4-pks-midx-via-odb-alternate-v1-5-f31150d21331@pks.im>
On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> The function `get_multi_pack_index()` loads multi-pack indices via
> `prepare_packed_git()` and then returns the linked list of multi-pack
> indices that is stored in `struct object_database`. That list is in the
> process of being removed though in favor of storing the MIDX as part of
> the object database source it belongs to.
>
> Refactor `get_multi_pack_index()` so that it returns the multi-pack
> index for a single object source. Callers are now expected to call this
> function for each source they are interested in. This requires them to
> iterate through alternates, so we have to prepare alternate object
> sources before doing so.
>
> Signed-off-by: Patrick Steinhardt <ps@pks•im>
> ---
> builtin/pack-objects.c | 9 ++++++---
> builtin/repack.c | 4 ++--
> midx-write.c | 22 ++--------------------
> object-name.c | 21 ++++++++++++++-------
> pack-bitmap.c | 20 ++++++++++++++------
> packfile.c | 30 +++++++++++-------------------
> packfile.h | 3 +--
> 7 files changed, 50 insertions(+), 59 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5781dec9808..f889e69e07d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1701,7 +1701,6 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> {
> int want;
> struct list_head *pos;
> - struct multi_pack_index *m;
>
> if (!exclude && local && has_loose_object_nonlocal(oid))
> return 0;
> @@ -1721,9 +1720,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> *found_offset = 0;
> }
>
> - for (m = get_multi_pack_index(the_repository); m; m = m->next) {
> + odb_prepare_alternates(the_repository->objects);
> +
> + for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
All makes sense up to this point. I would suggest adding the following
to the top of this function:
struct repository *r = the_repository;
struct odb_source *source;
to shorten up these lines a bit, but I'm nit-picking so if you feel
strongly that the existing patch is clearer, I won't object.
> struct pack_entry e;
> - if (fill_midx_entry(the_repository, oid, &e, m)) {
> +
> + if (m && fill_midx_entry(the_repository, oid, &e, m)) {
Good.
> want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
> if (want != -1)
> return want;
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9bbf032b6dd..5956df5d927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> static void remove_redundant_pack(const char *dir_name, const char *base_name)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> + struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
Is the first source always guaranteed to be the local one? I assume that the
answer here is "yes", and there are certainly other places in the code
where we make a similar assumption. But just wanted to make sure as
this popped into my head while reading.
> strbuf_addf(&buf, "%s.pack", base_name);
> - if (m && midx_contains_pack(m, buf.buf))
> + if (m && m->local && midx_contains_pack(m, buf.buf))
...hmm, maybe not?
> clear_midx_file(the_repository);
> strbuf_insertf(&buf, 0, "%s/", dir_name);
> unlink_pack_path(buf.buf, 1);
> diff --git a/midx-write.c b/midx-write.c
> index f2cfb85476e..c1ae62d3549 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> const char *object_dir)
> {
> - struct multi_pack_index *result = NULL;
> - struct multi_pack_index *cur;
> - char *obj_dir_real = real_pathdup(object_dir, 1);
> - struct strbuf cur_path_real = STRBUF_INIT;
> -
> - /* Ensure the given object_dir is local, or a known alternate. */
> - odb_find_source(r->objects, obj_dir_real);
> -
> - for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> - strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> - if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> - result = cur;
> - goto cleanup;
> - }
> - }
> -
> -cleanup:
> - free(obj_dir_real);
> - strbuf_release(&cur_path_real);
> - return result;
> + struct odb_source *source = odb_find_source(r->objects, object_dir);
> + return get_multi_pack_index(source);
When I first read this I wondered what would happen if we passed in an
unknown object_dir such that odb_find_source() returned NULL. But that
function will never return NULL, and instead will die() if given a bogus
object_dir.
So this is fine, though I would have imagined that we'd return NULL
within odb_find_source() and let the caller die() (or not).
> }
>
> static int fill_packs_from_midx(struct write_midx_context *ctx,
> diff --git a/object-name.c b/object-name.c
> index ddafe7f9b13..1e7fdcb90a8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
>
> static void find_short_packed_object(struct disambiguate_state *ds)
> {
> - struct multi_pack_index *m;
> struct packed_git *p;
>
> /* Skip, unless oids from the storage hash algorithm are wanted */
> if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> return;
>
> - for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> - m = m->next)
> - unique_in_midx(m, ds);
> + odb_prepare_alternates(ds->repo->objects);
> + for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + unique_in_midx(m, ds);
> + }
> +
Makes sense, though now having seen this pattern a few times, I am
wondering if it would be worth it to add a utility function that takes a
callback and iterates over the various MIDXs. But perhaps that is taking
DRY-ing things up a little too far ;-).
For what it's worth, I do think that what you wrote here makes more
logical sense: MIDXs are tied to individual alternate object DBs, which
here means that there is one MIDX per "struct odb_source". It just is a
little more verbose to type out.
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 0a4af199c05..7b51d381837 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -692,14 +692,16 @@ static int open_midx_bitmap(struct repository *r,
> struct bitmap_index *bitmap_git)
> {
> int ret = -1;
> - struct multi_pack_index *midx;
>
> assert(!bitmap_git->map);
>
> - for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
> - if (!open_midx_bitmap_1(bitmap_git, midx))
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *midx = get_multi_pack_index(source);
> + if (midx && !open_midx_bitmap_1(bitmap_git, midx))
> ret = 0;
> }
Makes sense.
> +
Stray diff?
> @@ -3307,9 +3309,15 @@ int verify_bitmap_files(struct repository *r)
> {
> int res = 0;
>
> - for (struct multi_pack_index *m = get_multi_pack_index(r);
> - m; m = m->next) {
> - char *midx_bitmap_name = midx_bitmap_filename(m);
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + char *midx_bitmap_name;
> +
> + if (!m)
> + continue;
> +
> + midx_bitmap_name = midx_bitmap_filename(m);
This one looks good as well.
> diff --git a/packfile.c b/packfile.c
> index e5d9d7ac8bc..e1ced050451 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -963,14 +963,17 @@ static void prepare_packed_git(struct repository *r);
> unsigned long repo_approximate_object_count(struct repository *r)
> {
> if (!r->objects->approximate_object_count_valid) {
> - unsigned long count;
> - struct multi_pack_index *m;
> + unsigned long count = 0;
> struct packed_git *p;
>
> prepare_packed_git(r);
I was wondering where the odb_prepare_alternates() call was in this one,
but it is a byproduct of calling prepare_packed_git(). So this spot
looks OK as well.
> - count = 0;
> - for (m = get_multi_pack_index(r); m; m = m->next)
> - count += m->num_objects;
> +
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + count += m->num_objects;
Unrelated to your patch, I wonder if it makes sense to check for
overflow here. I think so, though presumably if we are overflowing
"unsigned long" then likely we have much bigger problems to worry about
;-).
The rest looks good to me.
Thanks,
Taylor
next prev parent reply other threads:[~2025-07-10 23:56 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
2025-07-10 20:56 ` Justin Tobler
2025-07-10 23:10 ` Taylor Blau
2025-07-10 23:19 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-10 21:07 ` Justin Tobler
2025-07-10 23:14 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-10 21:31 ` Justin Tobler
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-10 23:22 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
2025-07-10 22:16 ` Justin Tobler
2025-07-10 23:26 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-10 22:35 ` Justin Tobler
2025-07-10 23:56 ` Taylor Blau [this message]
2025-07-15 8:27 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-10 22:48 ` Justin Tobler
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
2025-07-10 23:58 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 3/7] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-15 21:59 ` [PATCH v2 0/7] odb: track multi-pack-indices via their object sources Justin Tobler
2025-07-23 21:22 ` Junio C Hamano
2025-07-24 8:00 ` Patrick Steinhardt
2025-08-12 21:58 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHBTLFvL+eQcw6J0@nand.local \
--to=me@ttaylorr$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=ps@pks$(echo .)im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox