From: Phillip Wood <phillip.wood123@gmail•com>
To: Alexander Monakov <amonakov@ispras•ru>,
Phillip Wood <phillip.wood@dunelm•org.uk>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
Date: Wed, 13 Aug 2025 14:10:56 +0100 [thread overview]
Message-ID: <b118903c-a50a-4ae4-b41e-1c47c37218c4@gmail.com> (raw)
In-Reply-To: <c2fe3b69-8436-af46-c47d-dde5bb037227@ispras.ru>
Hi Alexander
On 11/08/2025 15:14, Alexander Monakov wrote:
>
> On Mon, 11 Aug 2025, Phillip Wood wrote:
>
>>> That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)
>>
>> It would be helpful to add a column with those calculations in it rather than
>> forcing the reader to calculate the speed up for themselves.
>
> Ok, will change it to
>
> version | speedup over (A) | cycles, bn | instructions, bn
> ----------------------------------------------------------
> A 6.38 11.3
> B 1.027 6.21 10.89
> C 1.1 5.80 9.95
> D 1.094 5.83 8.74
> ----------------------------------------------------------
That looks good, thanks
>> Also what is the cycles column measuring? What is it that takes 6.21 cycles
>> for B and only 5.8 cycles for C?
>
> Billions of cycles, e.g. in C the entire command completes in 5.8e9 CPU cycles.
Ah, for some reason I'd not realized than bn was short for billion
>>> Then you get 9% from the inlining patch and only 2% from the faster hash
>>> function? That's a bit surprising, which compiler and CPU you used? Is it
>>> with default optimization (-O2)?
>>
>> I used gcc with -O2 -march=native on an i5-8500. I saw a similar improvement
>> from the inlining when I was playing with xxhash.
>
> Thanks, I'll see if I can benchmark it on a Skylake in the coming days. That
> said, I think most users will get Git from their distro, without -march=native,
> right? So I'd suggest looking at plain -O2, especially for xxhash, which
> selects hashing primitives based on CPU-indicating predefined macros.
For xxhash I was using the system library rather than compiling it myself
>>> I'd say under reasonable assumptions (e.g. a not too ancient CPU with
>>> 3-cycle integer multiplication) the new scheme is generally faster even
>>> without asm.
>>
>> Thanks, fwiw I don't see a measurable difference in the timings with and
>> without the asm on my machine -
>
> To be clear, by "without asm" you mean forcing the !__GNUC__ branch where
> REASSOC_FENCE macro is empty?
Exactly
>> sometimes one is faster, sometimes the other, any difference is within the
>> noise.
>
> Would you mind showing your 'gcc --version'?
gcc (Debian 12.2.0-14+deb12u1) 12.2.0
> Also, I prefer 'perf stat' for
> such measurements, because its measurements are not so sensitive to frequency
> scaling (plus, you can compare my cycles/instructions counts with yours if you
> run 'perf stat', but I cannot compare your seconds from hyperfine with mine
> because of course my CPU runs at a different frequency than yours).
>
> 'perf stat -r 5' runs the workload 5 times and prints averages and deviation.
I'll try and take a look at that though I'm off line next week and I'm
not sure I'll have time before then.
>>> No, what we need to do here is outside of the abstract machine's view,
>>> standard functions are not going to help.
>>
>> That's a shame. I'd hoped that stopping the compiler reorder the code would do
>> the same thing - what is the asm doing that's different?
>
> atomic_signal_fence only blocks reordering of references to memory that can be
> observed from a signal handler interrupting the current thread. It has no effect
> on variables whose addresses do not escape (let alone never taken in the first
> place). Here we want to force a particular evaluation order for variables that
> end up on registers and are not supposed to appear in memory at all.
Ah, that makes sense
Thanks
Phillip
next prev parent reply other threads:[~2025-08-13 13:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
2025-07-28 20:50 ` Junio C Hamano
2025-07-28 20:57 ` Alexander Monakov
2025-08-04 13:49 ` Phillip Wood
2025-08-04 14:39 ` Alexander Monakov
2025-08-11 13:13 ` Phillip Wood
2025-08-11 14:14 ` Alexander Monakov
2025-08-12 17:56 ` Alexander Monakov
2025-08-20 21:34 ` Junio C Hamano
2025-09-08 19:06 ` Alexander Monakov
2025-09-08 21:04 ` Junio C Hamano
2025-08-13 13:10 ` Phillip Wood [this message]
2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
2025-07-28 19:56 ` Eli Schwartz
2025-07-28 20:54 ` Junio C Hamano
2025-07-28 20:25 ` Alexander Monakov
2025-08-14 15:01 ` Junio C Hamano
2025-08-28 23:40 ` Junio C Hamano
2025-08-29 1:13 ` Jacob Keller
2025-08-29 3:09 ` Elijah Newren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b118903c-a50a-4ae4-b41e-1c47c37218c4@gmail.com \
--to=phillip.wood123@gmail$(echo .)com \
--cc=amonakov@ispras$(echo .)ru \
--cc=git@vger$(echo .)kernel.org \
--cc=phillip.wood@dunelm$(echo .)org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox