Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim

public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed

From: Phillip Wood <phillip.wood123@gmail•com>
To: Alexander Monakov <amonakov@ispras•ru>,
	Phillip Wood <phillip.wood@dunelm•org.uk>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim
Date: Wed, 13 Aug 2025 14:10:56 +0100	[thread overview]
Message-ID: <b118903c-a50a-4ae4-b41e-1c47c37218c4@gmail.com> (raw)
In-Reply-To: <c2fe3b69-8436-af46-c47d-dde5bb037227@ispras.ru>

Hi Alexander

On 11/08/2025 15:14, Alexander Monakov wrote:
> 
> On Mon, 11 Aug 2025, Phillip Wood wrote:
> 
>>> That's what the 'cycles' column in the table gives (6.21/5.8 = 1.070...)
>>
>> It would be helpful to add a column with those calculations in it rather than
>> forcing the reader to calculate the speed up for themselves.
> 
> Ok, will change it to
> 
> version | speedup over (A) | cycles, bn | instructions, bn
> ----------------------------------------------------------
> A                            6.38         11.3
> B         1.027              6.21         10.89
> C         1.1                5.80          9.95
> D         1.094              5.83          8.74
> ----------------------------------------------------------

That looks good, thanks

>> Also what is the cycles column measuring? What is it that takes 6.21 cycles
>> for B and only 5.8 cycles for C?
> 
> Billions of cycles, e.g. in C the entire command completes in 5.8e9 CPU cycles.

Ah, for some reason I'd not realized than bn was short for billion
>>> Then you get 9% from the inlining patch and only 2% from the faster hash
>>> function? That's a bit surprising, which compiler and CPU you used? Is it
>>> with default optimization (-O2)?
>>
>> I used gcc with -O2 -march=native on an i5-8500. I saw a similar improvement
>> from the inlining when I was playing with xxhash.
> 
> Thanks, I'll see if I can benchmark it on a Skylake in the coming days. That
> said, I think most users will get Git from their distro, without -march=native,
> right? So I'd suggest looking at plain -O2, especially for xxhash, which
> selects hashing primitives based on CPU-indicating predefined macros.

For xxhash I was using the system library rather than compiling it myself
>>> I'd say under reasonable assumptions (e.g. a not too ancient CPU with
>>> 3-cycle integer multiplication) the new scheme is generally faster even
>>> without asm.
>>
>> Thanks, fwiw I don't see a measurable difference in the timings with and
>> without the asm on my machine -
> 
> To be clear, by "without asm" you mean forcing the !__GNUC__ branch where
> REASSOC_FENCE macro is empty?

Exactly

>> sometimes one is faster, sometimes the other, any difference is within the
>> noise.
> 
> Would you mind showing your 'gcc --version'?

gcc (Debian 12.2.0-14+deb12u1) 12.2.0

> Also, I prefer 'perf stat' for
> such measurements, because its measurements are not so sensitive to frequency
> scaling (plus, you can compare my cycles/instructions counts with yours if you
> run 'perf stat', but I cannot compare your seconds from hyperfine with mine
> because of course my CPU runs at a different frequency than yours).
> 
> 'perf stat -r 5' runs the workload 5 times and prints averages and deviation.

I'll try and take a look at that though I'm off line next week and I'm 
not sure I'll have time before then.
>>> No, what we need to do here is outside of the abstract machine's view,
>>> standard functions are not going to help.
>>
>> That's a shame. I'd hoped that stopping the compiler reorder the code would do
>> the same thing - what is the asm doing that's different?
> 
> atomic_signal_fence only blocks reordering of references to memory that can be
> observed from a signal handler interrupting the current thread. It has no effect
> on variables whose addresses do not escape (let alone never taken in the first
> place). Here we want to force a particular evaluation order for variables that
> end up on registers and are not supposed to appear in memory at all.

Ah, that makes sense

Thanks

Phillip

next prev parent reply	other threads:[~2025-08-13 13:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-28 19:05 [PATCH 0/2] optimize string hashing in xdiff Alexander Monakov
2025-07-28 19:05 ` [PATCH 1/2] xdiff: refactor xdl_hash_record() Alexander Monakov
2025-07-28 19:05 ` [PATCH 2/2] xdiff: optimize xdl_hash_record_verbatim Alexander Monakov
2025-07-28 20:50   ` Junio C Hamano
2025-07-28 20:57     ` Alexander Monakov
2025-08-04 13:49   ` Phillip Wood
2025-08-04 14:39     ` Alexander Monakov
2025-08-11 13:13       ` Phillip Wood
2025-08-11 14:14         ` Alexander Monakov
2025-08-12 17:56           ` Alexander Monakov
2025-08-20 21:34             ` Junio C Hamano
2025-09-08 19:06               ` Alexander Monakov
2025-09-08 21:04                 ` Junio C Hamano
2025-08-13 13:10           ` Phillip Wood [this message]
2025-07-28 19:32 ` [PATCH 0/2] optimize string hashing in xdiff Junio C Hamano
2025-07-28 19:56   ` Eli Schwartz
2025-07-28 20:54     ` Junio C Hamano
2025-07-28 20:25   ` Alexander Monakov
2025-08-14 15:01     ` Junio C Hamano
2025-08-28 23:40     ` Junio C Hamano
2025-08-29  1:13       ` Jacob Keller
2025-08-29  3:09       ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b118903c-a50a-4ae4-b41e-1c47c37218c4@gmail.com \
    --to=phillip.wood123@gmail$(echo .)com \
    --cc=amonakov@ispras$(echo .)ru \
    --cc=git@vger$(echo .)kernel.org \
    --cc=phillip.wood@dunelm$(echo .)org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox