From: Toon Claes <toon@iotcl•com>
To: Karthik Nayak <karthik.188@gmail•com>, git@vger•kernel.org
Cc: jltobler@gmail•com, ps@pks•im, Karthik Nayak <karthik.188@gmail•com>
Subject: Re: [PATCH v2 2/2] bundle: fix non-linear performance scaling with refs
Date: Thu, 10 Apr 2025 10:57:29 +0200 [thread overview]
Message-ID: <871pu0fl6u.fsf@iotcl.com> (raw)
In-Reply-To: <20250408-488-generating-bundles-with-many-references-has-non-linear-performance-v2-2-0802fc36a23d@gmail.com>
Karthik Nayak <karthik.188@gmail•com> writes:
> The 'git bundle create' command has non-linear performance with the
> number of refs in the repository. Benchmarking the command shows that
> a large portion of the time (~75%) is spent in the
> `object_array_remove_duplicates()` function.
>
> The `object_array_remove_duplicates()` function was added in
> b2a6d1c686 (bundle: allow the same ref to be given more than once,
> 2009-01-17) to skip duplicate refs provided by the user from being
> written to the bundle. Since this is an O(N^2) algorithm, in repos with
> large number of references, this can take up a large amount of time.
>
> Let's instead use a 'strset' to skip duplicates inside
> `write_bundle_refs()`. This improves the performance by around 6 times
> when tested against in repository with 100000 refs:
>
> Benchmark 1: bundle (refcount = 100000, revision = master)
> Time (mean ± σ): 14.653 s ± 0.203 s [User: 13.940 s, System: 0.762 s]
> Range (min … max): 14.237 s … 14.920 s 10 runs
>
> Benchmark 2: bundle (refcount = 100000, revision = HEAD)
> Time (mean ± σ): 2.394 s ± 0.023 s [User: 1.684 s, System: 0.798 s]
> Range (min … max): 2.364 s … 2.425 s 10 runs
>
> Summary
> bundle (refcount = 100000, revision = HEAD) ran
> 6.12 ± 0.10 times faster than bundle (refcount = 100000, revision = master)
I've done some benchmarking with some "real life" repositories, which
only have a couple of thousand refs and there the difference
(expectedly) barely noticable. Which is good to know there also isn't
any regression.
This version looks good to me, I approve.
--
Toon
next prev parent reply other threads:[~2025-04-10 8:57 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-01 17:00 [PATCH 0/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-01 17:00 ` [PATCH 1/2] t6020: test for duplicate refnames in bundle creation Karthik Nayak
2025-04-01 17:00 ` [PATCH 2/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-03 19:07 ` Toon Claes
2025-04-06 20:48 ` Karthik Nayak
2025-04-08 9:00 ` [PATCH v2 0/2] " Karthik Nayak
2025-04-08 9:00 ` [PATCH v2 1/2] t6020: test for duplicate refnames in bundle creation Karthik Nayak
2025-04-08 9:00 ` [PATCH v2 2/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-10 8:57 ` Toon Claes [this message]
2025-04-10 9:04 ` Karthik Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=871pu0fl6u.fsf@iotcl.com \
--to=toon@iotcl$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=jltobler@gmail$(echo .)com \
--cc=karthik.188@gmail$(echo .)com \
--cc=ps@pks$(echo .)im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox