public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Gusted <gusted@codeberg•org>
To: Toon Claes <toon@iotcl•com>, git@vger•kernel.org
Subject: Re: git-last-modified weirdness
Date: Mon, 5 Jan 2026 12:52:01 +0100	[thread overview]
Message-ID: <4b6fe686-bb3d-4d10-8a4d-7542b4c93e45@codeberg.org> (raw)
In-Reply-To: <87v7hgpbrk.fsf@iotcl.com>

On 1/5/26 11:57 AM, Toon Claes wrote:

 > Gusted <gusted@codeberg•org> writes:
 >
 >> Hi,
 >>
 >> Resending this mail as it looks like it might not have arrived (couldn't
 >> find it in the mailing list archive).
 > Thanks for following up. I didn't see it yet.
 >
 >> For Forgejo, I wanted to look into using git-last-modified to gain extra
 >> performance for larger repositories where this can often result in being
 >> (one of) the slowest git operation. However I noticed some problems that
 >> looks to be bugs.
 >>
 >> I've ran all the following commands on the following Git repository, 
on Git
 >> v2.52.0 (Arch Linux) and my git config does not enable or disable any
 >> feature that should've impacted the any of the following observations.
 >>
 >> $ tmp=$(mktemp -d)
 >> $ git clone https://codeberg.org/forgejo/forgejo $tmp
 >> $ cd tmp
 >>
 >> During some experiments I noticed it being slower for some files. An
 >> example:
 >>
 >> $ hyperfine --warmup 5 'git log --max-count=1 DCO' 'git 
last-modified DCO'
 >> Benchmark 1: git log --max-count=1 DCO
 >>     Time (mean ± σ):      86.9 ms ±   0.8 ms    [User: 70.1 ms, 
System: 15.6 ms]
 >>     Range (min … max):    85.5 ms …  88.3 ms    34 runs
 >>
 >> Benchmark 2: git last-modified DCO
 >>     Time (mean ± σ):     151.3 ms ±   4.3 ms    [User: 133.4 ms, 
System: 15.9 ms]
 >>     Range (min … max):   145.4 ms … 167.1 ms    19 runs
 > In my local benchmarks I see similar results.
 >
 > I agree this isn't great, but git-log(1) is just very good at logging a
 > single path. git-last-modified(1) is mostly designed to give commits
 > for a bunch of paths. For example:
 >
 >      $ hyperfine --warmup 5 'git ls-tree HEAD --name-only | xargs 
--max-args=1 git log --max-count=1 --format=oneline --' 'git last-modified'
 >      Benchmark 1: git ls-tree HEAD --name-only | xargs --max-args=1 
git log --max-count=1 --format=oneline --
 >        Time (mean ± σ):     852.5 ms ±   9.2 ms    [User: 703.8 ms, 
System: 141.9 ms]
 >        Range (min … max):   841.9 ms … 869.4 ms    10 runs
 >
 >      Benchmark 2: git last-modified
 >        Time (mean ± σ):     141.2 ms ±   2.0 ms    [User: 133.0 ms, 
System: 7.9 ms]
 >        Range (min … max):   137.7 ms … 146.0 ms    21 runs
 >
 >      Summary
 >        git last-modified ran
 >          6.04 ± 0.11 times faster than git ls-tree HEAD --name-only | 
xargs --max-args=1 git log --max-count=1 --format=oneline --
Only using git-last-modified when there are more than a few paths is
okay for how I want to use it. I was not really able to deduce this
from the manual, the general feeling after reading Github blog, Gitlab
blog and the release notes of v2.52.0 it looked to be a good
replacement of git log -n1 in all cases.
 >> This might be me misunderstanding the feature, but it looks to me this
 >> cannot be used for paths that is inside a directory. The following 
two commands
 >> yield the same output:
 >>
 >> $ git last-modified -- web_src
 >> 24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
 >> $ git last-modified -- web_src/svg
 >> 24019ef5e83fd7bed7f31ad09dd8d5f26b4bdc69        web_src
 >>
 >> Where I expected the latter command to return the last commit of
 >> web_src/svg.
 > I agree this is confusing. And I plan to propose a change to this
 > behavior. But at the moment what you're supposed to do in this
 > situation:
 >
 >      $ git last-modified -- web_src
 >      28e0af23faf6c8e8f353ba2ae818ee0f83fd3e5c        web_src
 >      $ git last-modified -r --max-depth=0 -- web_src/svg
 >      b8f15e4ea09c6571872607874ae099269ea4b201        web_src/svg
 >
 > I plan to change the default behavior to basically behave like `-r
 > --max-depth=0`. But I'm happy to hear your input if you think it should
 > be something else?
 > There's some context here[1], but as said, I might shift direction a bit
 > toward making the default more intuitive.
 >
 > [1]: 
https://lore.kernel.org/git/20251126-toon-last-modified-zzzz-v1-0-608350df0caa@iotcl.com/

Oh, there's a whole new option! That's exactly what I was looking for
to get that behavior. Only returning the root level information by
default looks and feels silly and does remind me of git-diff-tree's
default, so I would agree on having -r --max-depth=0 as the default.
Returning the information exactly for the paths being given sounds most
reasonable.

Although given you mention that this command works best for multiple
paths I can also imagine -r --max-depth=1 as default to nudge people to
use it for that purpose.

 >> I'm not sure why I tried this, but I can trigger a BUG when giving 
it some
 >> nonsense input:
 >>
 >> $ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584
 >> BUG: builtin/last-modified.c:456: paths remaining beyond boundary in
 >> last-modified
 >> [1]    690163 IOT instruction (core dumped)  git last-modified
 >>
 >> `fb06ce04173d47aaaa498385621cba8b8dfd7584` is the tree commit id of
 >> web_src. I
 >> suppose this should've returned a nice error message or blank output. It
 >> does
 >> give a blank output when you specify a valid path:
 >>
 >> $ git last-modified fb06ce04173d47aaaa498385621cba8b8dfd7584 web_src
 >>
 > Hah, that sounds like a real bug. Thanks for reporting, I will look into
 > it.
 >
 >> Kind regards,
 >> Gusted
 >>
 >>

Kind Regards
Gusted

      reply	other threads:[~2026-01-05 11:54 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <406222e6-d10b-47d8-a177-de5912db4512@codeberg.org>
2026-01-04  5:13 ` git-last-modified weirdness Gusted
2026-01-05 10:57   ` Toon Claes
2026-01-05 11:52     ` Gusted [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b6fe686-bb3d-4d10-8a4d-7542b4c93e45@codeberg.org \
    --to=gusted@codeberg$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=toon@iotcl$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox