public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail•com>
To: Ezekiel Newren via GitGitGadget <gitgitgadget@gmail•com>,
	git@vger•kernel.org
Cc: Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail•com>,
	Patrick Steinhardt <ps@pks•im>,
	Chris Torek <chris.torek@gmail•com>,
	Ezekiel Newren <ezekielnewren@gmail•com>
Subject: Re: [PATCH v2 01/10] doc: define unambiguous type mappings across C and Rust
Date: Thu, 6 Nov 2025 09:55:31 +0000	[thread overview]
Message-ID: <995f77a3-b94c-46df-87d3-22c7b2a3c762@gmail.com> (raw)
In-Reply-To: <88133848d1a317f8a95c19ee5482b828a3f8705f.1761776388.git.gitgitgadget@gmail.com>

Hi Ezekiel

On 29/10/2025 22:19, Ezekiel Newren via GitGitGadget wrote:
> From: Ezekiel Newren <ezekielnewren@gmail•com>
> 
> Document other nuances with crossing the FFI boundary. Other language
> mappings may be added in the future.

Thanks for adding this, I've left a few comments below. Overall I 
thought it was very well written. I tried building an html version of 
this but even after adding it to the list of TECH_DOCS in 
Documentation/Makefile with

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 47208269a2e..2699f0b24af 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -143,6 +143,7 @@ TECH_DOCS += technical/shallow
  TECH_DOCS += technical/sparse-checkout
  TECH_DOCS += technical/sparse-index
  TECH_DOCS += technical/trivial-merge
+TECH_DOCS += technical/unambiguous-types
  TECH_DOCS += technical/unit-tests
  SP_ARTICLES += $(TECH_DOCS)
  SP_ARTICLES += technical/api-index

it fails with

$ make -C Documentation/ technical/unambiguous-types.html 
                                       Merge branch 
'ps/object-source-loose' into seen
make: Entering directory '/home/phil/src/git/Documentation'
     GEN asciidoc.conf
     * new asciidoc flags
     ASCIIDOC technical/unambiguous-types.html
asciidoc: ERROR: unambiguous-types.adoc: line 139: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
asciidoc: ERROR: unambiguous-types.adoc: line 162: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
asciidoc: ERROR: unambiguous-types.adoc: line 177: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
asciidoc: ERROR: unambiguous-types.adoc: line 187: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
asciidoc: ERROR: unambiguous-types.adoc: line 199: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
asciidoc: ERROR: unambiguous-types.adoc: line 213: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
asciidoc: ERROR: unambiguous-types.adoc: line 224: undefined filter 
attribute in command: source-highlight --gen-version -f xhtml -s 
{language} {src_numbered?--line-number=' '} {src_tab?--tab={src_tab}} 
{args=}
make: *** [Makefile:396: technical/unambiguous-types.html] Error 1
make: *** Deleting file 'technical/unambiguous-types.html'
make: Leaving directory '/home/phil/src/git/Documentation'

> +== Character types
> +
> +This is where C and Rust don't have a clean one-to-one mapping. A C `char` is
> +an 8-bit type that is signless (neither signed nor unsigned) 

I found this a bit confusing. Isn't the signedness of "char" 
implementation defined rather than it being "signless"

> which causes
> +problems with e.g. `make DEVELOPER=1`.

I'm not sure what this is referring to - maybe -Wsign-compare?

> Rust's `char` type is an unsigned 32-bit
> +integer that is used to describe Unicode code points. Even though a C `char`
> +is the same width as `u8`, `char` should be converted to u8 where it is
> +describing bytes in memory. 

I'm dreading the point where we start sharing "struct strbuf" with rust 
and have to change the "buf" member from "char*" to "uint8_t*". While it 
is not used in the xdiff code it is ubiquitous everywhere else and there 
are lots of places where be pass the "buf" member to functions expecting 
a "char*".

	git grep -E '(\.|->)buf\W'

has over 4000 matches

> If a C `char` is not describing bytes, then it
> +should be converted to a more accurate unambiguous type.

That's a good point.

> +While you could specify `char` in the C code and `u8` in Rust code, it's not as
> +clear what the appropriate type is, but it would work across the FFI boundary.
> +However the bigger problem comes from code generation tools like cbindgen and
> +bindgen. When cbindgen see u8 in Rust it will generate uint8_t on the C side
> +which will cause differ in signedness warnings/errors. Similaraly if bindgen
> +see `char` on the C side it will generate `std::ffi::c_char` which has its own
> +problems.

Yeah, we definitely don't want to be using "std::ffi::c_char" in our 
rust implementations. I do wonder if we might want to use it (or CStr) 
judiciously in function parameters and immediately convert it to u8 in 
the function body where the function is called from C though.

> +=== Notes
> +^1^ This is only true if stdbool.h (or equivalent) is used. +
> +^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the
> +platform/arch for C does not follow IEEE-754 then this equivalence does not
> +hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but
> +there may be a strange platform/arch where even this isn't true. +
> +^3^ C also defines uintptr_t, but this should not be used in Git. +
> +^4^ C also defines ssize_t and intptr_t, but these should not be used in Git. +

[u]intptr_t and ssize_t are used in git already. As Junio has pointed 
out there are sane uses for these types but we don't want to use them in 
structs or function parameters where the struct or function is shared 
with rust.

> +
> +== Problems with std::ffi::c_* types in Rust
> +TL;DR: They're not guaranteed to match C types for all possible C
> +compilers/platforms/architectures.

Is this official policy of the rust project?

Thanks

Phillip

> +Only a few of Rust's C FFI types are considered safe and semantically clear to
> +use: +
> +
> +* `c_void`
> +* `CStr`
> +* `CString`
> +
> +Even then, they should be used sparingly, and only where the semantics match
> +exactly.
> +
> +The std::os::raw::c_* (which is deprecated) directly inherits the problems of
> +core::ffi, which changes over time and seems to make a best guess at the
> +correct definition for a given platform/target. This probably isn't a problem
> +for all platforms that Rust supports currently, but can anyone say that Rust
> +got it right for all C compilers of all platforms/targets?
> +
> +On top of all of that we're targeting an older version of Rust which doesn't
> +have the latest mappings.
> +
> +To give an example: c_long is defined in
> +footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]]
> +footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]]
> +
> +=== Rust version 1.63.0
> +
> +[source]
> +----
> +mod c_long_definition {
> +    cfg_if! {
> +        if #[cfg(all(target_pointer_width = "64", not(windows)))] {
> +            pub type c_long = i64;
> +            pub type NonZero_c_long = crate::num::NonZeroI64;
> +            pub type c_ulong = u64;
> +            pub type NonZero_c_ulong = crate::num::NonZeroU64;
> +        } else {
> +            // The minimal size of `long` in the C standard is 32 bits
> +            pub type c_long = i32;
> +            pub type NonZero_c_long = crate::num::NonZeroI32;
> +            pub type c_ulong = u32;
> +            pub type NonZero_c_ulong = crate::num::NonZeroU32;
> +        }
> +    }
> +}
> +----
> +
> +=== Rust version 1.89.0
> +
> +[source]
> +----
> +mod c_long_definition {
> +    crate::cfg_select! {
> +        any(
> +            all(target_pointer_width = "64", not(windows)),
> +            // wasm32 Linux ABI uses 64-bit long
> +            all(target_arch = "wasm32", target_os = "linux")
> +        ) => {
> +            pub(super) type c_long = i64;
> +            pub(super) type c_ulong = u64;
> +        }
> +        _ => {
> +            // The minimal size of `long` in the C standard is 32 bits
> +            pub(super) type c_long = i32;
> +            pub(super) type c_ulong = u32;
> +        }
> +    }
> +}
> +----
> +
> +Even for the cases where C types are correctly mapped to Rust types via
> +std::ffi::c_* there are still problems. Let's take c_char for example. On some
> +platforms it's u8 on others it's i8.
> +
> +=== Subtraction underflow in debug mode
> +
> +The following code will panic in debug on platforms that define c_char as u8,
> +but won't if it's an i8.
> +
> +[source]
> +----
> +let mut x: std::ffi::c_char = 0;
> +x -= 1;
> +----
> +
> +=== Inconsistent shift behavior
> +
> +`x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8.
> +
> +[source]
> +----
> +let mut x: std::ffi::c_char = 0x80;
> +x >>= 1;
> +----
> +
> +=== Equality fails to compile on some platforms
> +
> +The following will not compile on platforms that define c_char as i8, but will
> +if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get
> +a warning on platforms that use u8 and a clean compilation where i8 is used.
> +
> +[source]
> +----
> +let mut x: std::ffi::c_char = 0x61;
> +assert_eq!(x, b'a');
> +----
> +
> +== Enum types
> +Rust enum types should not be used as FFI types. Rust enum types are more like
> +C union types than C enum's. For something like:
> +
> +[source]
> +----
> +#[repr(C, u8)]
> +enum Fruit {
> +    Apple,
> +    Banana,
> +    Cherry,
> +}
> +----
> +
> +It's easy enough to make sure the Rust enum matches what C would expect, but a
> +more complex type like.
> +
> +[source]
> +----
> +enum HashResult {
> +    SHA1([u8; 20]),
> +    SHA256([u8; 32]),
> +}
> +----
> +
> +The Rust compiler has to add a discriminant to the enum to distinguish between
> +the variants. The width, location, and values for that discriminant is up to
> +the Rust compiler and is not ABI stable.


  reply	other threads:[~2025-11-06  9:55 UTC|newest]

Thread overview: 118+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15 21:18 [PATCH 0/9] Xdiff cleanup part2 Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 1/9] xdiff: use ssize_t for dstart/dend, make them last in xdfile_t Ezekiel Newren via GitGitGadget
2025-10-21 11:32   ` Phillip Wood
2025-10-21 17:18     ` Junio C Hamano
2025-10-22 21:07       ` Ezekiel Newren
2025-10-22 21:38         ` Junio C Hamano
2025-10-22 21:51           ` Ezekiel Newren
2025-10-15 21:18 ` [PATCH 2/9] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-10-16 21:51   ` Kristoffer Haugsbakk
2025-10-21  8:33   ` Patrick Steinhardt
2025-10-22 21:12     ` Ezekiel Newren
2025-10-21 13:13   ` Phillip Wood
2025-10-21 18:15     ` Junio C Hamano
2025-10-22 13:27       ` Phillip Wood
2025-10-22 20:55         ` Ezekiel Newren
2025-10-15 21:18 ` [PATCH 3/9] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 4/9] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-10-21  8:33   ` Patrick Steinhardt
2025-10-22 21:20     ` Ezekiel Newren
2025-10-23  5:49       ` Patrick Steinhardt
2025-10-15 21:18 ` [PATCH 5/9] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-10-20 23:29   ` Ezekiel Newren
2025-10-21  5:10     ` Junio C Hamano
2025-10-21  8:33     ` Patrick Steinhardt
2025-10-21 10:03     ` Phillip Wood
2025-10-21 11:16       ` Chris Torek
2025-10-22 21:31       ` Ezekiel Newren
2025-10-15 21:18 ` [PATCH 6/9] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 7/9] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-10-15 21:18 ` [PATCH 8/9] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-10-21  8:34   ` Patrick Steinhardt
2025-10-22 22:14     ` Ezekiel Newren
2025-10-23  5:49       ` Patrick Steinhardt
2025-10-15 21:18 ` [PATCH 9/9] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-10-15 21:28 ` [PATCH 0/9] Xdiff cleanup part2 Junio C Hamano
2025-10-21 13:28 ` Phillip Wood
2025-10-21 13:41   ` Junio C Hamano
2025-10-29 22:19 ` [PATCH v2 00/10] " Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-06  9:55     ` Phillip Wood [this message]
2025-11-06 22:52       ` Ezekiel Newren
2025-11-09 14:14         ` Phillip Wood
2025-10-29 22:19   ` [PATCH v2 02/10] xdiff: use ssize_t for dstart/dend, make them last in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-06  9:55     ` Phillip Wood
2025-11-06 22:56       ` Ezekiel Newren
2025-10-29 22:19   ` [PATCH v2 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-06 10:49     ` Phillip Wood
2025-11-06 23:13       ` Ezekiel Newren
2025-11-06 10:55     ` Phillip Wood
2025-11-06 23:14       ` Ezekiel Newren
2025-10-29 22:19   ` [PATCH v2 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-06 11:00     ` Phillip Wood
2025-11-06 23:20       ` Ezekiel Newren
2025-10-29 22:19   ` [PATCH v2 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-10-29 22:19   ` [PATCH v2 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-10-30 14:26   ` [PATCH v2 00/10] Xdiff cleanup part2 Junio C Hamano
2025-11-11 19:42   ` [PATCH v3 " Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-11 20:52       ` Junio C Hamano
2025-11-11 21:05       ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 02/10] xdiff: use ptrdiff_t for dstart/dend Ezekiel Newren via GitGitGadget
2025-11-11 22:23       ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-11 22:53       ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-11-11 23:08       ` Junio C Hamano
2025-11-14  6:02         ` Ezekiel Newren
2025-11-14 16:31           ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-11 23:21       ` Junio C Hamano
2025-11-14  5:41         ` Ezekiel Newren
2025-11-14 20:06           ` Junio C Hamano
2025-11-11 19:42     ` [PATCH v3 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-11 19:42     ` [PATCH v3 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-11-11 23:40     ` [PATCH v3 00/10] Xdiff cleanup part2 Junio C Hamano
2025-11-14  5:52       ` Ezekiel Newren
2025-11-14 22:36     ` [PATCH v4 " Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-15  3:06         ` Ramsay Jones
2025-11-15  3:41           ` Ben Knoble
2025-11-15 14:55             ` Ramsay Jones
2025-11-15 16:42               ` Junio C Hamano
2025-11-15 16:59                 ` D. Ben Knoble
2025-11-15 20:03                   ` Junio C Hamano
2025-11-17  1:20                 ` Junio C Hamano
2025-11-17  2:08                   ` Ramsay Jones
2025-11-14 22:36       ` [PATCH v4 02/10] xdiff: use ptrdiff_t for dstart/dend Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-15  8:26         ` Junio C Hamano
2025-11-18 20:55           ` Ezekiel Newren
2025-11-14 22:36       ` [PATCH v4 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-14 22:36       ` [PATCH v4 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-11-18 22:34       ` [PATCH v5 00/10] Xdiff cleanup part2 Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 01/10] doc: define unambiguous type mappings across C and Rust Ezekiel Newren via GitGitGadget
2025-11-18 23:46           ` Ramsay Jones
2025-11-19  4:14             ` Junio C Hamano
2025-11-18 22:34         ` [PATCH v5 02/10] xdiff: use ptrdiff_t for dstart/dend Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 03/10] xdiff: make xrecord_t.ptr a uint8_t instead of char Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 04/10] xdiff: use size_t for xrecord_t.size Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 05/10] xdiff: use unambiguous types in xdl_hash_record() Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 06/10] xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 07/10] xdiff: make xdfile_t.nrec a size_t instead of long Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 08/10] xdiff: make xdfile_t.nreff " Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 09/10] xdiff: change rindex from long to size_t in xdfile_t Ezekiel Newren via GitGitGadget
2025-11-18 22:34         ` [PATCH v5 10/10] xdiff: rename rindex -> reference_index Ezekiel Newren via GitGitGadget
2025-11-18 23:11         ` [PATCH v5 00/10] Xdiff cleanup part2 Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=995f77a3-b94c-46df-87d3-22c7b2a3c762@gmail.com \
    --to=phillip.wood123@gmail$(echo .)com \
    --cc=chris.torek@gmail$(echo .)com \
    --cc=ezekielnewren@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitgitgadget@gmail$(echo .)com \
    --cc=kristofferhaugsbakk@fastmail$(echo .)com \
    --cc=phillip.wood@dunelm$(echo .)org.uk \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox