public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Toon Claes <toon@iotcl•com>
To: Karthik Nayak <karthik.188@gmail•com>, git@vger•kernel.org
Cc: jltobler@gmail•com, ps@pks•im, Karthik Nayak <karthik.188@gmail•com>
Subject: Re: [PATCH v2 2/2] bundle: fix non-linear performance scaling with refs
Date: Thu, 10 Apr 2025 10:57:29 +0200	[thread overview]
Message-ID: <871pu0fl6u.fsf@iotcl.com> (raw)
In-Reply-To: <20250408-488-generating-bundles-with-many-references-has-non-linear-performance-v2-2-0802fc36a23d@gmail.com>

Karthik Nayak <karthik.188@gmail•com> writes:

> The 'git bundle create' command has non-linear performance with the
> number of refs in the repository. Benchmarking the command shows that
> a large portion of the time (~75%) is spent in the
> `object_array_remove_duplicates()` function.
>
> The `object_array_remove_duplicates()` function was added in
> b2a6d1c686 (bundle: allow the same ref to be given more than once,
> 2009-01-17) to skip duplicate refs provided by the user from being
> written to the bundle. Since this is an O(N^2) algorithm, in repos with
> large number of references, this can take up a large amount of time.
>
> Let's instead use a 'strset' to skip duplicates inside
> `write_bundle_refs()`. This improves the performance by around 6 times
> when tested against in repository with 100000 refs:
>
> Benchmark 1: bundle (refcount = 100000, revision = master)
>   Time (mean ± σ):     14.653 s ±  0.203 s    [User: 13.940 s, System: 0.762 s]
>   Range (min … max):   14.237 s … 14.920 s    10 runs
>
> Benchmark 2: bundle (refcount = 100000, revision = HEAD)
>   Time (mean ± σ):      2.394 s ±  0.023 s    [User: 1.684 s, System: 0.798 s]
>   Range (min … max):    2.364 s …  2.425 s    10 runs
>
> Summary
>   bundle (refcount = 100000, revision = HEAD) ran
>     6.12 ± 0.10 times faster than bundle (refcount = 100000, revision = master)

I've done some benchmarking with some "real life" repositories, which
only have a couple of thousand refs and there the difference
(expectedly) barely noticable. Which is good to know there also isn't
any regression.

This version looks good to me, I approve.

--
Toon

  reply	other threads:[~2025-04-10  8:57 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-01 17:00 [PATCH 0/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-01 17:00 ` [PATCH 1/2] t6020: test for duplicate refnames in bundle creation Karthik Nayak
2025-04-01 17:00 ` [PATCH 2/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-03 19:07   ` Toon Claes
2025-04-06 20:48     ` Karthik Nayak
2025-04-08  9:00 ` [PATCH v2 0/2] " Karthik Nayak
2025-04-08  9:00   ` [PATCH v2 1/2] t6020: test for duplicate refnames in bundle creation Karthik Nayak
2025-04-08  9:00   ` [PATCH v2 2/2] bundle: fix non-linear performance scaling with refs Karthik Nayak
2025-04-10  8:57     ` Toon Claes [this message]
2025-04-10  9:04       ` Karthik Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871pu0fl6u.fsf@iotcl.com \
    --to=toon@iotcl$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=jltobler@gmail$(echo .)com \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox