public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail•com>
To: Ezekiel Newren <ezekielnewren@gmail•com>, phillip.wood@dunelm•org.uk
Cc: Ezekiel Newren via GitGitGadget <gitgitgadget@gmail•com>,
	git@vger•kernel.org,
	Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail•com>,
	Patrick Steinhardt <ps@pks•im>,
	Chris Torek <chris.torek@gmail•com>
Subject: Re: [PATCH v2 01/10] doc: define unambiguous type mappings across C and Rust
Date: Sun, 9 Nov 2025 14:14:11 +0000	[thread overview]
Message-ID: <fa95b29a-077c-4df5-9c59-34e0c1447e70@gmail.com> (raw)
In-Reply-To: <CAH=ZcbA25eyMhQpvK7eh=ydZkg5RdzbdRFEdj-22T+d1VuTazA@mail.gmail.com>

On 06/11/2025 22:52, Ezekiel Newren wrote:
> On Thu, Nov 6, 2025 at 2:55 AM Phillip Wood <phillip.wood123@gmail•com> wrote:
>> On 29/10/2025 22:19, Ezekiel Newren via GitGitGadget wrote:
>>> From: Ezekiel Newren <ezekielnewren@gmail•com>
>>>
>>> Document other nuances with crossing the FFI boundary. Other language
>>> mappings may be added in the future.
>>
>> Thanks for adding this, I've left a few comments below. Overall I
>> thought it was very well written.
> 
> Thanks.
> 
> I felt it was necessary since C vs Rust types keep coming up over and
> over again. I'm flexible with the wording of this document. I was just
> trying to convey a firm and clear stance on what is and isn't proper
> in Git.

That will definitely be useful as we add more rust code. In the future 
we may want to add a summary of which types to use to 
Documentation/CodingGuidelines but that doesn't need to be done in this 
series.

>> I tried building an html version of
>> this but even after adding it to the list of TECH_DOCS in
>> Documentation/Makefile with
>>
>> diff --git a/Documentation/Makefile b/Documentation/Makefile
>> index 47208269a2e..2699f0b24af 100644
>> --- a/Documentation/Makefile
>> +++ b/Documentation/Makefile
>> @@ -143,6 +143,7 @@ TECH_DOCS += technical/shallow
>>    TECH_DOCS += technical/sparse-checkout
>>    TECH_DOCS += technical/sparse-index
>>    TECH_DOCS += technical/trivial-merge
>> +TECH_DOCS += technical/unambiguous-types
>>    TECH_DOCS += technical/unit-tests
>>    SP_ARTICLES += $(TECH_DOCS)
>>    SP_ARTICLES += technical/api-index
>>
>> it fails with
>>
>> $ make -C Documentation/ technical/unambiguous-types.html
>>                                         Merge branch
>> 'ps/object-source-loose' into seen
>> make: Entering directory '/home/phil/src/git/Documentation'
>>       GEN asciidoc.conf
>>       * new asciidoc flags
>>       ASCIIDOC technical/unambiguous-types.html
>> asciidoc: ERROR: unambiguous-types.adoc: line 139: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> asciidoc: ERROR: unambiguous-types.adoc: line 162: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> asciidoc: ERROR: unambiguous-types.adoc: line 177: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> asciidoc: ERROR: unambiguous-types.adoc: line 187: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> asciidoc: ERROR: unambiguous-types.adoc: line 199: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> asciidoc: ERROR: unambiguous-types.adoc: line 213: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> asciidoc: ERROR: unambiguous-types.adoc: line 224: undefined filter
>> attribute in command: source-highlight --gen-version -f xhtml -s
>> {language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}}
>> {args=}
>> make: *** [Makefile:396: technical/unambiguous-types.html] Error 1
>> make: *** Deleting file 'technical/unambiguous-types.html'
>> make: Leaving directory '/home/phil/src/git/Documentation'
> 
> I've never created documentation for Git before, so this helps. I'll
> incorporate your suggestions.

We should also add this file to Documentation/technical/meson.build. It 
seems those errors above are due to some incompatibility between 
asciidoc and asciidoctor as I just tried running

     make -C Documentation/ USE_ASCIIDOCTOR=1 
technical/unambiguous-types.html

and it worked just fine. I'm afraid I don't know enough asciidoc to make 
any helpful suggestions on how to fix it.

>>> +== Character types
>>> +
>>> +This is where C and Rust don't have a clean one-to-one mapping. A C `char` is
>>> +an 8-bit type that is signless (neither signed nor unsigned)
>>
>> I found this a bit confusing. Isn't the signedness of "char"
>> implementation defined rather than it being "signless"
>>
>>> which causes
>>> +problems with e.g. `make DEVELOPER=1`.
>>
>> I'm not sure what this is referring to - maybe -Wsign-compare?
> 
> When I build Git with `make DEVELOPER=1` and I compare uint8_t with
> char it complains about a difference in signedness. When I compare
> int8_t with char it also complains about a difference in signedness.
> So it is implementation defined, but it's also neither signed nor
> unsigned according to DEVELOPER=1 since it complains either way.

Oh, I see - this is saying mixing "char" and "uint8_t" causes problems. 
I agree, perhaps we could expand this slightly to mention comparison 
with uint8_t to make it clearer.

>>> Rust's `char` type is an unsigned 32-bit
>>> +integer that is used to describe Unicode code points. Even though a C `char`
>>> +is the same width as `u8`, `char` should be converted to u8 where it is
>>> +describing bytes in memory.
>>
>> I'm dreading the point where we start sharing "struct strbuf" with rust
>> and have to change the "buf" member from "char*" to "uint8_t*". While it
>> is not used in the xdiff code it is ubiquitous everywhere else and there
>> are lots of places where be pass the "buf" member to functions expecting
>> a "char*".
>>
>>          git grep -E '(\.|->)buf\W'
>>
>> has over 4000 matches
> 
> This is why I started in Xdiff since its code is mostly isolated.

Good plan!

> I
> think that we might have to bite the bullet and deal with the ugly
> mapping of char on the C side and u8 on the Rust side when dealing
> with strbuf. Maybe as we translate more of C into Rust someone will
> have a better suggestion. I think my ivec type would be better since
> strbuf is almost a special case of my ivec type, but dealing with
> strbuf is outside the scope of this patch series.

Yes, hopefully it will become clearer what the least painful route 
forward is as we get more experience with rust <=> C iterop.

>>> +While you could specify `char` in the C code and `u8` in Rust code, it's not as
>>> +clear what the appropriate type is, but it would work across the FFI boundary.
>>> +However the bigger problem comes from code generation tools like cbindgen and
>>> +bindgen. When cbindgen see u8 in Rust it will generate uint8_t on the C side
>>> +which will cause differ in signedness warnings/errors. Similarly if bindgen
>>> +see `char` on the C side it will generate `std::ffi::c_char` which has its own
>>> +problems.
>>
>> Yeah, we definitely don't want to be using "std::ffi::c_char" in our
>> rust implementations. I do wonder if we might want to use it (or CStr)
>> judiciously in function parameters and immediately convert it to u8 in
>> the function body where the function is called from C though.
> 
> That's basically the design pattern I've been using.
> 
> In many of my translations from C to Rust I create a Rust stub
> function that takes pointer types and wraps them into safe types which
> then get handed off to a safe Rust function. I think that in the cases
> where CString/CStr is required the Rust stub function would create a
> &[u8] slice for the safe function to operate on.

That sounds like a good pattern - we get a nice interface for the C code 
and the rust implementation uses the idiomatic rust types.

Thanks

Phillip

>>> +=== Notes
>>> +^1^ This is only true if stdbool.h (or equivalent) is used. +
>>> +^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the
>>> +platform/arch for C does not follow IEEE-754 then this equivalence does not
>>> +hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but
>>> +there may be a strange platform/arch where even this isn't true. +
>>> +^3^ C also defines uintptr_t, but this should not be used in Git. +
>>> +^4^ C also defines ssize_t and intptr_t, but these should not be used in Git. +
>>
>> [u]intptr_t and ssize_t are used in git already. As Junio has pointed
>> out there are sane uses for these types but we don't want to use them in
>> structs or function parameters where the struct or function is shared
>> with rust.
> 
> You're right, I should update the phrasing. Something like: "These
> types shouldn't be used if their explicit purpose is for FFI. Whether
> as a field in a struct or part of a function signature." I'll update
> the wording.
> 
>>> +
>>> +== Problems with std::ffi::c_* types in Rust
>>> +TL;DR: They're not guaranteed to match C types for all possible C
>>> +compilers/platforms/architectures.
>>
>> Is this official policy of the rust project?
> 
> No, this is a personal inference based on logical deduction. The c_*
> definitions have changed over time with new Rust version releases, and
> Git targets more platforms/architectures than what Rust officially
> supports. While it's not guaranteed that it won't work everywhere.
> It's also not guaranteed to work everywhere either. On top of that
> we're targeting 1.63.0 who's c_* definitions are different in 1.89.0
> which I show an example of with c_long_definition. Can anyone say with
> certainty that Rust got these mappings right or wrong for all possible
> C compilers/architectures/platforms? If so (which I highly doubt)
> could someone provide a link?
> 


  reply	other threads:[~2025-11-09 14:14 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15 21:18 [PATCH 0/9] Xdiff cleanup part2 Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 1/9] xdiff: use ssize_t for dstart/dend, make them last in xdfile_t Ezekiel Newren via GitGitGadget
2025-10-21 11:32   ` Phillip Wood
2025-10-21 17:18     ` Junio C Hamano
2025-10-22 21:07       ` Ezekiel Newren
2025-10-22 21:38         ` Junio C Hamano
2025-10-22 21:51           ` Ezekiel Newren
2025-10-15 21:18 ` [PATCH 2/9] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-10-16 21:51   ` Kristoffer Haugsbakk
2025-10-21  8:33   ` Patrick Steinhardt
2025-10-22 21:12     ` Ezekiel Newren
2025-10-21 13:13   ` Phillip Wood
2025-10-21 18:15     ` Junio C Hamano
2025-10-22 13:27       ` Phillip Wood
2025-10-22 20:55         ` Ezekiel Newren
2025-10-15 21:18 ` [PATCH 3/9] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 4/9] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-10-21  8:33   ` Patrick Steinhardt
2025-10-22 21:20     ` Ezekiel Newren
2025-10-23  5:49       ` Patrick Steinhardt
2025-10-15 21:18 ` [PATCH 5/9] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-10-20 23:29   ` Ezekiel Newren
2025-10-21  5:10     ` Junio C Hamano
2025-10-21  8:33     ` Patrick Steinhardt
2025-10-21 10:03     ` Phillip Wood
2025-10-21 11:16       ` Chris Torek
2025-10-22 21:31       ` Ezekiel Newren
2025-10-15 21:18 ` [PATCH 6/9] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 7/9] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 8/9] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-10-21  8:34   ` Patrick Steinhardt
2025-10-22 22:14     ` Ezekiel Newren
2025-10-23  5:49       ` Patrick Steinhardt
2025-10-15 21:18 ` [PATCH 9/9] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-10-15 21:28 ` [PATCH 0/9] Xdiff cleanup part2 Junio C Hamano
2025-10-21 13:28 ` Phillip Wood
2025-10-21 13:41   ` Junio C Hamano
2025-10-29 22:19 ` [PATCH v2 00/10] " Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-06  9:55     ` Phillip Wood
2025-11-06 22:52       ` Ezekiel Newren
2025-11-09 14:14         ` Phillip Wood [this message]
2025-10-29 22:19   ` [PATCH v2 02/10] xdiff: use ssize_t for dstart/dend, make them last in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-06  9:55     ` Phillip Wood
2025-11-06 22:56       ` Ezekiel Newren
2025-10-29 22:19   ` [PATCH v2 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-06 10:49     ` Phillip Wood
2025-11-06 23:13       ` Ezekiel Newren
2025-11-06 10:55     ` Phillip Wood
2025-11-06 23:14       ` Ezekiel Newren
2025-10-29 22:19   ` [PATCH v2 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-06 11:00     ` Phillip Wood
2025-11-06 23:20       ` Ezekiel Newren
2025-10-29 22:19   ` [PATCH v2 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-10-30 14:26   ` [PATCH v2 00/10] Xdiff cleanup part2 Junio C Hamano
2025-11-11 19:42   ` [PATCH v3 " Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-11 20:52       ` Junio C Hamano
2025-11-11 21:05       ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 02/10] xdiff: use ptrdiff_t for dstart/dend Ezekiel Newren via GitGitGadget
2025-11-11 22:23       ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-11 22:53       ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-11-11 23:08       ` Junio C Hamano
2025-11-14  6:02         ` Ezekiel Newren
2025-11-14 16:31           ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-11 23:21       ` Junio C Hamano
2025-11-14  5:41         ` Ezekiel Newren
2025-11-14 20:06           ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-11-11 23:40     ` [PATCH v3 00/10] Xdiff cleanup part2 Junio C Hamano
2025-11-14  5:52       ` Ezekiel Newren
2025-11-14 22:36     ` [PATCH v4 " Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-15  3:06         ` Ramsay Jones
2025-11-15  3:41           ` Ben Knoble
2025-11-15 14:55             ` Ramsay Jones
2025-11-15 16:42               ` Junio C Hamano
2025-11-15 16:59                 ` D. Ben Knoble
2025-11-15 20:03                   ` Junio C Hamano
2025-11-17  1:20                 ` Junio C Hamano
2025-11-17  2:08                   ` Ramsay Jones
2025-11-14 22:36       ` [PATCH v4 02/10] xdiff: use ptrdiff_t for dstart/dend Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-15  8:26         ` Junio C Hamano
2025-11-18 20:55           ` Ezekiel Newren
2025-11-14 22:36       ` [PATCH v4 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-11-18 22:34       ` [PATCH v5 00/10] Xdiff cleanup part2 Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-18 23:46           ` Ramsay Jones
2025-11-19  4:14             ` Junio C Hamano
2025-11-18 22:34         ` [PATCH v5 02/10] xdiff: use ptrdiff_t for dstart/dend Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-11-18 23:11         ` [PATCH v5 00/10] Xdiff cleanup part2 Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa95b29a-077c-4df5-9c59-34e0c1447e70@gmail.com \
    --to=phillip.wood123@gmail$(echo .)com \
    --cc=chris.torek@gmail$(echo .)com \
    --cc=ezekielnewren@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=kristofferhaugsbakk@fastmail$(echo .)com \
    --cc=phillip.wood@dunelm$(echo .)org.uk \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox