public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Arijit Banerjee via GitGitGadget <gitgitgadget@gmail•com>
Cc: git@vger•kernel.org, "Ævar Arnfjörð Bjarmason" <avarab@gmail•com>,
	"Junio C Hamano" <gitster@pobox•com>,
	"Derrick Stolee" <stolee@gmail•com>,
	"Arijit Banerjee" <arijit91@gmail•com>,
	"Arijit Banerjee" <arijit@effectiveailabs•com>
Subject: Re: [PATCH v2] index-pack: retain child bases in delta cache
Date: Tue, 2 Jun 2026 02:45:19 -0400	[thread overview]
Message-ID: <20260602064519.GD695568@coredump.intra.peff.net> (raw)
In-Reply-To: <pull.2131.v2.git.1780330402264.gitgitgadget@gmail.com>

On Mon, Jun 01, 2026 at 04:13:21PM +0000, Arijit Banerjee via GitGitGadget wrote:

> When resolving a delta whose result has children of its own,
> index-pack adds the result to work_head, accounts its data in
> base_cache_used, and calls prune_base_data(). It then immediately frees
> that same data.
> 
> This bypasses the existing delta base cache policy and can force later
> descendants to reconstruct the queued base again. Let the existing
> delta_base_cache_limit pruning policy decide whether to keep or evict
> the data instead.
> 
> This does not add a new cache or increase the cache limit. The object
> data is already accounted in base_cache_used before prune_base_data()
> runs, and the existing pruning and base cleanup paths still release it.

That explanation makes sense, but I'm left with one question/concern.
Dropping the data for a base makes sense when we are "done" with it,
because we know we won't need it anymore and it leaves more room in the
cache for things we do care about.

The problem here is that the current notion of "done" is not correct.
Imagine we have delta chains "A -> B -> C" and "A -> D -> F". We are
totally done with A when we have resolved both B and D, but if I
understand correctly, we currently throw it away after just resolving B.

Your patch never throws it away, and just waits for it to get evicted
from the cache due to memory pressure. But could we realize the moment
when B and D have both finished using it, and evict it then? That makes
it more likely for us to keep something useful in the cache when there
is pressure.

I'm not sure how hard that would be in practice, or how much it would
help (the base cache works in list order, so I think it might naturally
be a sort of LRU?).

-Peff

  reply	other threads:[~2026-06-02  6:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29 16:06 [PATCH] index-pack: retain child bases in delta cache Arijit Banerjee via GitGitGadget
2026-06-01 12:50 ` Derrick Stolee
2026-06-01 16:13 ` [PATCH v2] " Arijit Banerjee via GitGitGadget
2026-06-02  6:45   ` Jeff King [this message]
2026-06-03  0:05   ` [PATCH v3] " Arijit Banerjee via GitGitGadget
2026-06-03 12:24     ` Derrick Stolee
2026-06-04  7:12     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260602064519.GD695568@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=arijit91@gmail$(echo .)com \
    --cc=arijit@effectiveailabs$(echo .)com \
    --cc=avarab@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=gitster@pobox$(echo .)com \
    --cc=stolee@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox