public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail•com>
To: Patrick Steinhardt <ps@pks•im>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail•com>
Cc: git@vger•kernel.org, christian.couder@gmail•com,
	gitster@pobox•com, johannes.schindelin@gmx•de,
	johncai86@gmail•com, jonathantanmy@google•com,
	karthik.188@gmail•com, kristofferhaugsbakk@fastmail•com,
	me@ttaylorr•com, newren@gmail•com, peff@peff•net
Subject: Re: [PATCH 2/3] path-walk: fix setup of pending objects
Date: Thu, 21 Aug 2025 16:33:21 -0400	[thread overview]
Message-ID: <b37e2690-fe6a-4fde-ab7e-58368c2914c3@gmail.com> (raw)
In-Reply-To: <aKbSRQJCPh3Lsew8@pks.im>

On 8/21/2025 4:01 AM, Patrick Steinhardt wrote:
> On Wed, Aug 20, 2025 at 06:39:55PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail•com>
>>
>> The previous change established a buggy instance of 'git repack -adf
>> --path-walk' when there exist paths that are tracked in the index and
>> that is the only instance of those paths in the history of the
>> repository. This change fixes that bug.
>>
>> The core problem here is that the "maybe_interesting" member of 'struct
>> type_and_oid_list' is not initialized to '1'. This member was added in
>> 6333e7ae0b (path-walk: mark trees and blobs as UNINTERESTING,
>> 2024-12-20) in a way to help when creating packfiles for a small commit
>> range using the sparse path algorithm (enabled by pack.useSparse=true).
>>
>> The idea here is that the list is marked as "maybe_interesting" if an
>> object is added that does not have the UNINITERSTING flag on it. Later,
> 
> s/UNINITERSTING/UNINTERESTING/

Thanks!

>> this is checked again in case all objects in the list were marked
>> UNINTERESTING after that point in time. In this case, the algorithm
>> skips the list as there is no reason to visit it.
>>
>> This leads to the problem where the "maybe_interesting" member was not
>> appropriately initialized when the list is created from pending objects.
>> This is the fix for now.
>>
>> To help avoid this from happening in the future, a follow-up change will
>> make initializing lists use a shared method instead of allowing for an
>> update to this initialization process to miss some existing copies.
> 
> Yeah, I wanted to say that this feels quite fragile to me and very easy
> to miss. Does this mechanism buy us a lot of performance in the first
> place? Because if not we might as well just remove it entirely.

The details for the space savings and moderate time cost are listed in
e5394794a5 (pack-objects: thread the path-based compression, 2025-05-16).
These improvements are on top of those from --name-hash-version=2 by
using a different way to focus delta calculations.

A larger, internal example for this can be seen in this table (based on
testing today):

Mode      |  Size    | Time
----------+----------+------
version 1 |  16.0 GB | 83min
version 2 |   9.9 GB | 77min
path walk |   6.4 GB | 74min

> But if the answer is "yes" then adding APIs around it feels like a good
> alternative.
Making the code less brittle to changes is good, but also I'm interested
in ways to improve our test infrastructure or adding defensive features.
I mentioned a couple of ideas in an earlier message.

Thanks,
-Stolee

  reply	other threads:[~2025-08-21 20:33 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-20 18:39 [PATCH 0/3] [2.51.0 Bug] Missing singleton objects in 'git repack -adf --path-walk' Derrick Stolee via GitGitGadget
2025-08-20 18:39 ` [PATCH 1/3] t7700: add failing --path-walk test Derrick Stolee via GitGitGadget
2025-08-21  8:00   ` Patrick Steinhardt
2025-08-21 12:42     ` Derrick Stolee
2025-08-21 16:22       ` Junio C Hamano
2025-08-21 23:21   ` Elijah Newren
2025-08-20 18:39 ` [PATCH 2/3] path-walk: fix setup of pending objects Derrick Stolee via GitGitGadget
2025-08-20 19:02   ` Junio C Hamano
2025-08-20 19:42     ` Derrick Stolee
2025-08-21  8:01       ` Patrick Steinhardt
2025-08-21 12:55         ` Derrick Stolee
2025-08-21  8:01   ` Patrick Steinhardt
2025-08-21 20:33     ` Derrick Stolee [this message]
2025-08-21 23:21   ` Elijah Newren
2025-08-20 18:39 ` [PATCH 3/3] path-walk: create initializer for path lists Derrick Stolee via GitGitGadget
2025-08-21 23:22   ` Elijah Newren
2025-08-25 12:49 ` [PATCH v2 0/2] [2.51.0 Bug] Missing singleton objects in 'git repack -adf --path-walk' Derrick Stolee via GitGitGadget
2025-08-25 12:49   ` [PATCH v2 1/2] path-walk: fix setup of pending objects Derrick Stolee via GitGitGadget
2025-08-25 12:49   ` [PATCH v2 2/2] path-walk: create initializer for path lists Derrick Stolee via GitGitGadget
2025-08-26 15:03   ` [PATCH v2 0/2] [2.51.0 Bug] Missing singleton objects in 'git repack -adf --path-walk' Elijah Newren
2025-08-26 15:58     ` Junio C Hamano
2025-09-02 11:19       ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b37e2690-fe6a-4fde-ab7e-58368c2914c3@gmail.com \
    --to=stolee@gmail$(echo .)com \
    --cc=christian.couder@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=gitster@pobox$(echo .)com \
    --cc=johannes.schindelin@gmx$(echo .)de \
    --cc=johncai86@gmail$(echo .)com \
    --cc=jonathantanmy@google$(echo .)com \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=kristofferhaugsbakk@fastmail$(echo .)com \
    --cc=me@ttaylorr$(echo .)com \
    --cc=newren@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox