Re: [PATCH 2/2] parseopt: check for duplicate long names and numerical options

public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed

From: "René Scharfe" <l.s.r@web•de>
To: Jeff King <peff@peff•net>
Cc: Junio C Hamano <gitster@pobox•com>, git@vger•kernel.org
Subject: Re: [PATCH 2/2] parseopt: check for duplicate long names and numerical options
Date: Sat, 28 Feb 2026 10:19:11 +0100	[thread overview]
Message-ID: <dc882c28-0846-41e3-a9e8-1a4bc44a1ebc@web.de> (raw)
In-Reply-To: <20260227230822.GA2965111@coredump.intra.peff.net>

On 2/28/26 12:08 AM, Jeff King wrote:
> On Fri, Feb 27, 2026 at 05:50:56PM -0500, Jeff King wrote:
> 
>> On Fri, Feb 27, 2026 at 08:27:02PM +0100, René Scharfe wrote:
>>
>>> The check clearly has a cost, but I have a hard time measuring it.
>>> We already do lots of (kinda cheap) checks.  Turning them on only
>>> in DEVELOPER builds (and ideally demonstrating a speedup) left as
>>> an exercise for interested readers (with stronger benchmark-fu)..
>>
>> I agree it is probably not introducing a measurable slowdown. If we were
>> to make it conditional, I'd suggest a run-time toggle (so we could turn
>> it on for all test scripts, but not regular use).

Good idea.  We could piggy-back on -h.

> Just for fun, I was going to write a script that generated a test-tool
> parse-options list with 100k entries. But then I realized we already
> have something like that!
> 
> If you do this:
> 
>   (
>     echo usage
>     echo --
>     for i in $(seq 100000); do
>       echo "opt$i option $i"
>     done
>   ) >input
> 
> then hyperfine reports (before and after your patches):
> 
>   Benchmark 1: ./git.old rev-parse --parseopt -- --opt42 <input
>     Time (mean ± σ):      22.2 ms ±   0.4 ms    [User: 16.6 ms, System: 5.6 ms]
>     Range (min … max):    21.5 ms …  23.9 ms    127 runs
>   
>   Benchmark 2: ./git.new rev-parse --parseopt -- --opt42 <input
>     Time (mean ± σ):      32.5 ms ±   0.5 ms    [User: 23.8 ms, System: 8.6 ms]
>     Range (min … max):    31.7 ms …  34.8 ms    89 runs
>   
>   Summary
>     ./git.old rev-parse --parseopt -- --opt42 <input ran
>       1.46 ± 0.03 times faster than ./git.new rev-parse --parseopt -- --opt42 <input
> 
> So it is measurable (even with the extra per-option costs to generate
> the option structs in the first place). Looks like on the order of 10ms
> for 100k options, or about 100ns per option. If you imagine that most
> option lists are smaller than 100, we're talking about probably the
> equivalent of 50-100 syscalls. If we are really looking to
> micro-optimize startup time, I suspect there's pretty low-hanging fruit
> to be found of that magnitude.

Interesting.  I don't like this percentage.  We won't have that many
options, ever, but we'd pay that small cost on every git invocation,
which add up.  The beneficiaries are just a handful of developers who
duplicate options, which seems like a bad deal.

>>> +		if (opts->long_name) {
>>> +			if (strset_contains(&long_names, opts->long_name))
>>> +				optbug(opts, "long name already used");
>>> +			strset_add(&long_names, opts->long_name);
>>> +		}
>>
>> ...if you want to micro-optimize, note that the return value of
>> strset_add() tells you whether the item was already in the set. That can
>> save one hash of the string.

Makes sense, good call.

>> Probably the allocation for each element is the dominating cost, though,
>> and it doesn't help with that.

My knee-jerk reaction is to use a fixed-size array and sort.  Gets rid
of allocations, needs some more CPU cycles and memory accesses.  That
would then either bug out on experiments like yours or detect
duplicates only in the first N long name options.  Not sure if it's
worth the limitations.

René

next prev parent reply	other threads:[~2026-02-28  9:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  0:13 [Bug] duplicated long-form options go unnoticed Junio C Hamano
2026-02-27 19:27 ` [PATCH 1/2] pack-objects: remove duplicate --stdin-packs definition René Scharfe
2026-02-27 19:27 ` [PATCH 2/2] parseopt: check for duplicate long names and numerical options René Scharfe
2026-02-27 22:50   ` Jeff King
2026-02-27 23:08     ` Jeff King
2026-02-27 23:28       ` Junio C Hamano
2026-02-28  9:19       ` René Scharfe [this message]
2026-02-28  9:19   ` [PATCH v2 " René Scharfe
2026-02-28 10:58     ` Jeff King
2026-02-28 11:28       ` René Scharfe
2026-03-02 18:24         ` Jeff King
2026-03-01 14:33       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dc882c28-0846-41e3-a9e8-1a4bc44a1ebc@web.de \
    --to=l.s.r@web$(echo .)de \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox