On 2025-10-28 at 09:18:26, Patrick Steinhardt wrote: > On Mon, Oct 27, 2025 at 12:44:01AM +0000, brian m. carlson wrote: > > In a future commit, we'll want to hash some data when dealing with a > > loose object map. Let's make this easy by creating a structure to hash > > objects and calling into the C functions as necessary to perform the > > hashing. For now, we only implement safe hashing, but in the future we > > could add unsafe hashing if we want. Implement Clone and Drop to > > appropriately manage our memory. Additionally implement Write to make > > it easy to use with other formats that implement this trait. > > What exactly do you mean with "safe" and "unsafe" hashing? Also, can't > we drop this distinction for now until we have a need for it? It's from the series that Taylor introduced. For SHA-1, safe hashing (the default) uses SHA-1-DC, but unsafe hashing, which does not operate on untrusted data (say, when we're writing a packfile we've created), may use a faster algorithm. See `git_hash_sha1_init_unsafe`. I can omit the `safe` attribute until we need it, sure. > > diff --git a/src/hash.rs b/src/hash.rs > > index a5b9493bd8..8798a50aef 100644 > > --- a/src/hash.rs > > +++ b/src/hash.rs > > @@ -39,6 +40,81 @@ impl ObjectID { > > } > > } > > > > +pub struct Hasher { > > + algo: HashAlgorithm, > > + safe: bool, > > + ctx: *mut c_void, > > +} > > Nit: missing documentation. Will fix in v2. > > +impl Hasher { > > + /// Create a new safe hasher. > > + pub fn new(algo: HashAlgorithm) -> Hasher { > > + let ctx = unsafe { c::git_hash_alloc() }; > > + unsafe { c::git_hash_init(ctx, algo.hash_algo_ptr()) }; > > I already noticed this in the patch that introduced this, but wouldn't > it make sense to expose `git_hash_new()` instead of the combination of > `alloc() + init()`? The benefit to this approach is that it allows us to reset a state in the future if we want. If we don't think that's necessary, I can certainly switch to `git_hash_new` if we prefer. -- brian m. carlson (they/them) Toronto, Ontario, CA