public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Eric Wong <e@yhbt•net>
To: Ivan Baldo <ibaldo@gmail•com>
Cc: git@vger•kernel.org
Subject: Re: Fastest way to set files date and time to latest commit time of each one
Date: Sat, 29 Aug 2020 04:48:42 +0000	[thread overview]
Message-ID: <20200829044842.GA5732@dcvr> (raw)
In-Reply-To: <CAEbcw=3mOoYuJo2mQgqB2aJgn-D2i_7ZRmhfPvYNVHD1Kp8wuA@mail.gmail.com>

Ivan Baldo <ibaldo@gmail•com> wrote:
>   Hello.
>   I know this is not standard usage of git, but I need a way to have
> more stable dates and times in the files in order to avoid rsync
> checksumming.
>   So I found this
> https://stackoverflow.com/questions/2179722/checking-out-old-file-with-original-create-modified-timestamps/2179876#2179876
> and modified it a bit to run in CentOS 7:
> 
> IFS="
> "
> for FILE in $(git ls-files -z | tr '\0' '\n')
> do
>     TIME=$(git log --pretty=format:%cd -n 1 --date=iso -- "$FILE")
>     touch -c -m -d "$TIME" "$FILE"
> done
> 
>   Unfortunately it takes ages for a 84k files repo.
>   I see the CPU usage is dominated by the git log command.

running git log for each file isn't necessary.

On Debian, rsync actually ships the `git-set-file-times' script
in /usr/share/doc/rsync/scripts/ which only runs `git log' once
and parses it.

You can also get my (original) version from:
https://yhbt.net/git-set-file-times

>   I know a way I could use to split the work for all the CPU threads
> but anyway, I would like to know if you guys and girls know of a
> faster way to do this.

Much of your overhead is going to be from process spawning.
My Perl version reduces that significantly.

I haven't tried it with 84K files, but it'll have to keep all
those filenames in memory.  I'm not sure if parallelizing
utime() syscalls is worth it, either; maybe it helps on SSD
more than HDD.

  parent reply	other threads:[~2020-08-29  5:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-29  1:36 Fastest way to set files date and time to latest commit time of each one Ivan Baldo
2020-08-29  3:20 ` Junio C Hamano
2020-08-29  4:59   ` Raymond E. Pasco
2020-08-29  4:48 ` Eric Wong [this message]
2020-09-02 19:28   ` Ivan Baldo
2020-08-29  6:46 ` Andreas Schwab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200829044842.GA5732@dcvr \
    --to=e@yhbt$(echo .)net \
    --cc=git@vger$(echo .)kernel.org \
    --cc=ibaldo@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox