public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Jonathan Tan <jonathantanmy@google•com>
Cc: Burke Libbey <burke.libbey@shopify•com>,  git@vger•kernel.org
Subject: Re: git-blame extremely slow in partial clones due to serial object fetching
Date: Thu, 21 Nov 2024 07:55:24 +0900	[thread overview]
Message-ID: <xmqqikshikgz.fsf@gitster.g> (raw)
In-Reply-To: <20241120185228.3204236-1-jonathantanmy@google.com> (Jonathan Tan's message of "Wed, 20 Nov 2024 10:52:28 -0800")

Jonathan Tan <jonathantanmy@google•com> writes:

> Technically, we do need the contents, ...
> There are other ways:
>
>  - If we can teach the client to collect object IDs for prefetching,
>    perhaps it would be just as easy to teach the server. We could
>    instead make filter-by-path an acceptable argument to pass to "fetch
>    --filter", then teach the lazy fetch to use that argument. This also
>    opens the door to future performance improvements - since the server
>    has all the objects, it can give us precisely the objects that we
>    need, and not just give us a quantity of objects based on a heuristic
>    (so the client does not need to say "give me 10, and if I need more,
>    I'll ask you again", but can say "give me all I need to complete
>    the blame). This, however, relies on server implementers to implement
>    and turn on such a feature.

This is an interesting half-way point, but I have a suspicion that
in order for the server side to give you all you need, the server
side has to do something close to computing the full blame.  Start
from a child commit plus the entire file as input, find blocks of
text in that entire file that are different in its parent (these are
the lines that are "blamed" to the child commit), pass control to
the same algorithm but using the parent commit plus the remainder of
the file (excluding the lines of text that have already "blamed") as
the input, rinse and repeat, until the "remainder of the file"
shrinks to empty.  When everything is "blamed", you know you can
stop.

So, a server that can give you something better than a heuristic
would have spent enough cycles to know the final result of "blame"
by the time it knows where it should/can stop, wouldn't it?

>  - We could also teach the server to "blame" a file for us and then
>    teach the client to stitch together the server's result with the
>    local findings, but this is more complicated.

Your local lazy repository, if you have anything you have to "stitch
together", would have your locally modified contents, and for you to
be able to make such modifications, it would also have at least the
blobs from HEAD, which you based your modifications on.  So you
should be able to locally run "git blame @{u}.." to find lines that
your locally modified contents are to be blamed, ask the other side
to give you a blame for @{u}, and overlay the former on top of the
latter.  

  reply	other threads:[~2024-11-20 22:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-19 20:16 git-blame extremely slow in partial clones due to serial object fetching Burke Libbey
2024-11-20 11:59 ` Manoraj K
2024-11-20 18:52 ` Jonathan Tan
2024-11-20 22:55   ` Junio C Hamano [this message]
2024-11-21  3:12     ` [External] " Han Young
2024-11-22  3:32       ` Shubham Kanodia
2024-11-22  8:29         ` Junio C Hamano
2024-11-22  8:51           ` Shubham Kanodia
2024-11-22 17:55             ` Jonathan Tan
2024-11-25  0:22               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqikshikgz.fsf@gitster.g \
    --to=gitster@pobox$(echo .)com \
    --cc=burke.libbey@shopify$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=jonathantanmy@google$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox