public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Taylor Blau <me@ttaylorr•com>
Cc: git@vger•kernel.org,  Jeff King <peff@peff•net>,
	 Elijah Newren <newren@gmail•com>
Subject: Re: [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs
Date: Thu, 13 Mar 2025 11:41:09 -0700	[thread overview]
Message-ID: <xmqq8qp8ojsq.fsf@gitster.g> (raw)
In-Reply-To: <1563552bbda0bc910c9f41b0fabc3225c4d778fc.1741889018.git.me@ttaylorr.com> (Taylor Blau's message of "Thu, 13 Mar 2025 14:09:47 -0400")

Taylor Blau <me@ttaylorr•com> writes:

> Once an object is written into a cruft pack, we can only freshen it by
> writing a new loose or packed copy of that object with a more recent
> mtime.
>
> Prior to 61568efa95 (builtin/pack-objects.c: support `--max-pack-size`
> with `--cruft`, 2023-08-28), we typically had at most one cruft pack in
> a repository at any given time. So freshening unreachable objects was
> straightforward when already rewriting the cruft pack (and its *.mtimes
> file).
>
> But 61568efa95 changes things: 'pack-objects' now supports writing
> multiple cruft packs when invoked with `--cruft` and the
> `--max-pack-size` flag. Cruft packs are rewritten until they reach some
> size threshold, at which point they are considered "frozen", and will
> only be modified in a pruning GC, or if the threshold itself is
> adjusted.
>
> Prior to this patch, however, this process breaks down when we attempt
> to freshen an object packed in an earlier cruft pack, and that cruft
> pack is larger than the threshold and thus will survive the repack.
>
> When this is the case, it is impossible to freshen objects in cruft
> pack(s) when those cruft packs are larger than the threshold. This is
> because we would avoid writing them in the new cruft pack entirely, for
> a couple of reasons.
>
>  1. When enumerating packed objects via 'add_objects_in_unpacked_packs()'
>     we pass the SKIP_IN_CORE_KEPT_PACKS, which is used to avoid looping
>     over the packs we're going to retain (which are marked as kept
>     in-core by 'read_cruft_objects()').
>
>     This means that we will avoid enumerating additional packed copies
>     of objects found in any cruft packs which are larger than the given
>     size threshold. Thus there is no opportunity to call
>     'create_object_entry()' whatsoever.
>
>  2. We likewise will discard the loose copy (if one exists) of any
>     unreachable object packed in a cruft pack that is larger than the
>     threshold. Here our call path is 'add_unreachable_loose_objects()',
>     which uses the 'add_loose_object()' callback.
>
>     That function will eventually land us in 'want_object_in_pack()'
>     (via 'add_cruft_object_entry()'), and we'll discard the object as it
>     appears in one of the packs which we marked as kept in-core.
>
> This means in effect that it is impossible to freshen an unreachable
> object once it appears in a cruft pack larger than the given threshold.
>
> Instead, we should pack an additional copy of an unreachable object we
> want to freshen even if it appears in a cruft pack, provided that the
> cruft copy has an mtime which is before the mtime of the copy we are
> trying to pack/freshen. This is sub-optimal in the sense that it
> requires keeping an additional copy of unreachable objects upon
> freshening, but we don't have a better alternative without the ability
> to make in-place modifications to existing *.mtimes files.
>
> In order to implement this, we have to adjust the behavior of
> 'want_found_object()'. When 'pack-objects' is told that we're *not*
> going to retain any cruft packs (i.e. the set of packs marked as kept
> in-core does not contain a cruft pack), the behavior is unchanged.
>
> But when there *is* at least one cruft pack that we're holding onto, it
> is no longer sufficient to reject a copy of an object found in that
> cruft pack for that reason alone. In this case, we only want to reject a
> candidate object when copies of that object either:
>
>  - exists in a non-cruft pack that we are retaining, regardless of that
>    pack's mtime, or
>
>  - exists in a cruft pack with an mtime at least as recent as the copy
>    we are debating whether or not to pack, in which case freshening
>    would be redundant.
>
> To do this, keep track of whether or not we have any cruft packs in our
> in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
> flag. When we end up in this new special case, we replace a call to
> 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject
> objects when we have a copy in an existing cruft pack with at least as
> recent an mtime as our candidate (in which case "freshening" would be
> redundant).
>
> Signed-off-by: Taylor Blau <me@ttaylorr•com>
> ---
>  builtin/pack-objects.c  | 118 ++++++++++++++++++++++++++++++++++------
>  packfile.c              |   3 +-
>  packfile.h              |   2 +
>  t/t7704-repack-cruft.sh |  66 ++++++++++++++++++++++
>  4 files changed, 171 insertions(+), 18 deletions(-)

Thanks.  Will queue.

      reply	other threads:[~2025-03-13 18:41 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-27 18:29 [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Taylor Blau
2025-02-27 18:29 ` [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-02-27 19:23   ` Elijah Newren
2025-02-27 22:53     ` Taylor Blau
2025-02-28  7:52   ` Patrick Steinhardt
2025-03-04 21:52     ` Elijah Newren
2025-03-05  2:04       ` Junio C Hamano
2025-03-05  0:09     ` Taylor Blau
2025-02-27 18:29 ` [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-02-27 19:26   ` Elijah Newren
2025-02-27 23:03     ` Taylor Blau
2025-02-27 19:28 ` [PATCH 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-02-27 23:05   ` Taylor Blau
2025-03-04 21:35 ` [PATCH v2 " Taylor Blau
2025-03-04 21:35   ` [PATCH v2 1/2] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-04 21:35   ` [PATCH v2 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-04 22:55   ` [PATCH v2 0/2] pack-objects: freshen objects with multi-cruft packs Elijah Newren
2025-03-05  0:06     ` Taylor Blau
2025-03-05  0:13       ` Taylor Blau
2025-03-05  0:15 ` [PATCH v3 0/1] " Taylor Blau
2025-03-05  0:15   ` [PATCH v3 1/1] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-06 10:31     ` Patrick Steinhardt
2025-03-13 17:32       ` Taylor Blau
2025-03-06 10:31   ` [PATCH v3 0/1] pack-objects: freshen objects with multi-cruft packs Patrick Steinhardt
2025-03-11  0:21 ` [PATCH v4 0/6] " Taylor Blau
2025-03-11  0:21   ` [PATCH v4 1/6] t/t5329-pack-objects-cruft.sh: evict 'repack'-related tests Taylor Blau
2025-03-11  0:21   ` [PATCH v4 2/6] t7704-repack-cruft.sh: consolidate `write_blob()` Taylor Blau
2025-03-11  0:21   ` [PATCH v4 3/6] t/lib-cruft.sh: extract some cruft-related helpers Taylor Blau
2025-03-11  0:21   ` [PATCH v4 4/6] pack-objects: generate cruft packs at most one object over threshold Taylor Blau
2025-03-11 21:59     ` Junio C Hamano
2025-03-12 15:22       ` Taylor Blau
2025-03-12 18:26         ` Junio C Hamano
2025-03-12 19:02           ` Taylor Blau
2025-03-12 19:13             ` Elijah Newren
2025-03-12 19:33               ` Taylor Blau
2025-03-12 20:43               ` Junio C Hamano
2025-03-12 20:49                 ` Elijah Newren
2025-03-13 12:16                   ` Junio C Hamano
2025-03-13 16:23                     ` Elijah Newren
2025-03-13 17:06                       ` Junio C Hamano
2025-03-11  0:21   ` [PATCH v4 5/6] builtin/repack.c: simplify cruft pack aggregation Taylor Blau
2025-03-11  0:21   ` [PATCH v4 6/6] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-11 20:13   ` [PATCH v4 0/6] pack-objects: freshen objects with multi-cruft packs Junio C Hamano
2025-03-12 15:33     ` Taylor Blau
2025-03-12 18:28       ` Junio C Hamano
2025-03-12 19:04         ` Taylor Blau
2025-03-12 19:46           ` Junio C Hamano
2025-03-12 19:52             ` Taylor Blau
2025-03-13 17:17               ` Junio C Hamano
2025-03-13 17:35                 ` Taylor Blau
2025-03-13  6:29           ` Jeff King
2025-03-13 15:12             ` Junio C Hamano
2025-03-13 18:09 ` [PATCH v5] builtin/pack-objects.c: freshen objects from existing cruft packs Taylor Blau
2025-03-13 18:41   ` Junio C Hamano [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqq8qp8ojsq.fsf@gitster.g \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=me@ttaylorr$(echo .)com \
    --cc=newren@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox