From: Dirk Gouders <dirk@gouders•net>
To: Junio C Hamano <gitster@pobox•com>
Cc: git list <git@vger•kernel.org>
Subject: Re: [PATCH 1/1] Documentation/user-manual.txt: example for generating object hashes
Date: Thu, 29 Feb 2024 23:35:35 +0100 [thread overview]
Message-ID: <gha5nigaq0.fsf@gouders.net> (raw)
In-Reply-To: <xmqqil27c5p1.fsf@gitster.g> (Junio C. Hamano's message of "Thu, 29 Feb 2024 13:37:46 -0800")
Junio C Hamano <gitster@pobox•com> writes:
> Dirk Gouders <dirk@gouders•net> writes:
>
>> If someone spends the time to work through the documentation, the
>> subject "hashes" can lead to contradictions:
>>
>> The README of the initial commit states hashes are generated from
>> compressed data (which changed very soon), whereas
>> Documentation/user-manual.txt says they are generated from original
>> data.
>>
>> Don't give doubts a chance: clarify this and present a simple example
>> on how object hashes can be generated manually.
>
> I'd rather not to waste readers' attention to historical wart.
Yes, but -- I should have mentioned it -- the document itself suggests
to read the initial commit.
But I don't mean to argue about that, perhaps I digged to deep into
details.
>> @@ -4095,6 +4095,39 @@ that is used to name the object is the hash of the original data
>> plus this header, so `sha1sum` 'file' does not match the object name
>> for 'file'.
>
> The paragraph above (part of it is hidden before the hunk) clearly
> states what the naming rules are. We hash the original and then
> compress. If I use an implementation of Git that drives the zlib at
> compression level 1, and if you clone from my repository with
> another implementation of Git whose zlib is driven at compression
> level 9, our .git/objects/01/2345...90 files may not be identical,
> but when uncompressed they should store the same contents, so "hash
> then compress" is the only sensible choice that is not affected by
> the compression to give stable names to objects.
Thank your for that detail.
>> +Starting with the initial commit, hashing was done on the compressed
>> +data and the file README of that commit explicitely states this:
>> +
>> +"The SHA1 hash is always the hash of the _compressed_ object, not the
>> +original one."
>> +
>> +This changed soon after that with commit
>> +d98b46f8d9a3 (Do SHA1 hash _before_ compression.). Unfortunately, the
>> +commit message doesn't provide the detailed reasoning.
>
> These three are about Git development history, which by itself may
> be of interest for some people, but the main target audience of the
> user-manual is probably different from them. They may be interested
> to learn how Git works, but it is only to feel that they understand
> how the "magic" things Git does, like "a cryptographic hash of
> contents is enough to uniquely identify the contents being tracked",
> works well to trust their precious contents [*].
>
> Side note:
> https://lore.kernel.org/git/Pine.LNX.4.58.0504200144260.6467@ppc970.osdl.org/
> explains the reason behind the change to those who did not find
> it obvious.
>
> FYI, another "breaking" change we did earlier in the history of the
> project was to update the sort order of paths in tree objects. We
> do not need to confuse readers by talking about the original and
> updated sort order. The only thing they need, when they want to get
> the feeling that they understand how things work, is the description
> of how things work in the version of Git they have ready access to.
> Historical mistakes we made, corrections we made and why, are
> certainly of interest but not for the target audience of this
> document.
Again thank you, very interesting reading.
> On the other hand, ...
>
>> +The following is a short example that demonstrates how hashes can be
>> +generated manually:
>> +
>> +Let's asume a small text file with the content "Hello git.\n"
>> +-------------------------------------------------
>> +$ cat > hello.txt <<EOF
>> +Hello git.
>> +EOF
>> +-------------------------------------------------
>> +
>> +We can now manually generate the hash `git` would use for this file:
>> +
>> +- The object we want the hash for is of type "blob" and its size is
>> + 11 bytes.
>> +
>> +- Prepend the object header to the file content and feed this to
>> + sha1sum(1):
>> +
>> +-------------------------------------------------
>> +$ printf "blob 11\0" | cat - hello.txt | sha1sum
>> +7217614ba6e5f4e7db2edaa2cdf5fb5ee4358b57 .
>> +-------------------------------------------------
>> +
>
> ... something like the above (modulo coding style) would be a useful
> addition to help those who want to convince themselves they
> understand how (some parts of) Git works under the hood, and I think
> it would be a welcome addition to some subset of such readers (the
> rest of the world may feel it is way too much detail, though).
>
> I would draw the line between this one and a similar description and
> demonstration of historical mistakes, which is not as relevant as
> how things work in the current system. In other words, to me, it is
> OK to dig a bit deep to show how the current scheme works but it is
> way too much to do the same for versions of the system that do not
> exist anymore.
>
> But others may draw the line differently and consider even the above
> a bit too much detail, which is a position I would also accept.
>
> Thanks.
next prev parent reply other threads:[~2024-02-29 22:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-29 20:57 [PATCH 0/1] Documentation/user-manual.txt: try to clarify on object hashes Dirk Gouders
2024-02-29 13:05 ` [PATCH 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
2024-02-29 21:37 ` Junio C Hamano
2024-02-29 22:35 ` Dirk Gouders [this message]
2024-02-29 22:57 ` Junio C Hamano
2024-03-08 6:45 ` Dirk Gouders
2024-03-08 15:24 ` Junio C Hamano
2024-03-08 22:11 ` Dirk Gouders
2024-03-12 10:41 ` [PATCH v2 0/1] Documentation/user-manual.txt: try to clarify on " Dirk Gouders
2024-03-12 10:41 ` [PATCH v2 1/1] Documentation/user-manual.txt: example for generating " Dirk Gouders
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=gha5nigaq0.fsf@gouders.net \
--to=dirk@gouders$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox