public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Justin Tobler <jltobler@gmail•com>
To: git@vger•kernel.org
Cc: ps@pks•im, gitster@pobox•com, me@ttaylorr•com,
	karthik.188@gmail•com, Justin Tobler <jltobler@gmail•com>
Subject: [PATCH v3 0/6] odb: add transaction interfaces to ODB subsystem
Date: Tue, 16 Sep 2025 13:29:32 -0500	[thread overview]
Message-ID: <20250916182938.2193476-1-jltobler@gmail.com> (raw)
In-Reply-To: <20250915202956.3784935-1-jltobler@gmail.com>

Greetings,

This series is a followup to [1] and continues iterating on the ODB
transaction interfaces.

The bulk-checkin subsystem provides an interface to manage ODB
transactions. Apart from {begin,end}_odb_transaction(), these functions
are only used by the object-file subsystem to manage aspects of a
transaction implementation specific to the files object source.

In a pluggable object database future where we could have different
types of object database sources, transaction handling will have to be
implemented separately per source. Thus, the primary focus of this
series is to simplify the existing ODB transaction interface and provide
a means to manage transactions via the ODB subsystem in an object source
agnostic manner eventually.

This series is built on top of 4975ec3473b (The seventh batch,
2025-09-08) with jt/de-global-bulk-checkin merged into it at ddc0b56ad77
(bulk-checkin: use repository variable from transaction, 2025-08-22).

Changes since V1:

- end_odb_transaction() is now a no-op when no transaction is specified
  and also guards against the ending a transaction not set in the object
  database.
- The object_file_transaction_begin() now accepts a `struct odb_source`
  instead of `struct odbject_database`. Which is more in line with it
  being a transaction implementation specific to an object source.
- Further clarified some commit messages.
- Renamed object_file_transaction_end() to
  object_file_transaction_commit() to match the corresponding function
  in the ODB section.
- Add some missing documentation for odb_transaction_{begin,commit}().

Change since V2:

- begin_odb_transaction() now behaves as a no-op and returns a NULL
  transaction value when the ODB already has an active transaction.
  Callers are no longer required to manually check if there is an active
  transaction beforehand.
- Reworded several commit messages.

Thanks,
-Justin

[1]: <20250820225531.1212935-1-jltobler@gmail•com>

Justin Tobler (6):
  bulk-checkin: remove ODB transaction nesting
  builtin/update-index: end ODB transaction when --verbose is specified
  bulk-checkin: drop flush_odb_transaction()
  object-file: relocate ODB transaction code
  object-file: update naming from bulk-checkin
  odb: add transaction interface

 Makefile                 |   1 -
 builtin/add.c            |   7 +-
 builtin/unpack-objects.c |   5 +-
 builtin/update-index.c   |  29 +--
 bulk-checkin.c           | 403 --------------------------------------
 bulk-checkin.h           |  61 ------
 cache-tree.c             |   5 +-
 meson.build              |   1 -
 object-file.c            | 404 ++++++++++++++++++++++++++++++++++++++-
 object-file.h            |  16 ++
 odb.c                    |  10 +
 odb.h                    |  13 ++
 read-cache.c             |   5 +-
 13 files changed, 462 insertions(+), 498 deletions(-)
 delete mode 100644 bulk-checkin.c
 delete mode 100644 bulk-checkin.h

Range-diff against v2:
1:  217f3eb6d2 ! 1:  9fe2ab4198 bulk-checkin: remove ODB transaction nesting
    @@ Commit message
         bulk-checkin: remove ODB transaction nesting
     
         ODB transactions support being nested. Only the outermost
    -    {begin,end}_odb_transaction() start and finish a transaction. This is
    -    done so that certain object write codepaths that occur internally can be
    -    optimized via ODB transactions without having to worry if a transaction
    -    has already been started or not. This can make the interface a bit
    -    awkward to use, as calling {begin,end}_odb_transaction() does not
    -    guarantee that a transaction is actually started or ended. Thus, in
    -    situations where a transaction must be explicitly flushed,
    -    flush_odb_transaction() must be used.
    +    {begin,end}_odb_transaction() start and finish a transaction. This
    +    allows internal object write codepaths to be optimized with ODB
    +    transactions without worrying about whether a transaction is already
    +    active. When {begin,end}_odb_transaction() is invoked during an active
    +    transaction, these operations are essentially treated as no-ops. This
    +    can make the interface a bit awkward to use, as calling
    +    end_odb_transaction() does not guarantee that a transaction is actually
    +    ended. Thus, in situations where a transaction needs to be explicitly
    +    flushed, flush_odb_transaction() must be used.
     
    -    To better clarify ownership sematics around a transaction and further
    -    remove the need for flush_odb_transaction() as part of the transaction
    -    interface, instead be more explicit and require callers who use ODB
    -    transactions internally to ensure there is not already a pending
    -    transaction before beginning or ending a transaction.
    +    To remove the need for an explicit transaction flush operation via
    +    flush_odb_transaction() and better clarify transaction semantics, drop
    +    the transaction nesting mechanism in favor of begin_odb_transaction()
    +    returning a NULL transaction value to signal it was a no-op, and
    +    end_odb_transaction() behaving as a no-op when a NULL transaction value
    +    is passed. This is safe for existing callers as the transaction value
    +    wired to end_odb_transaction() already comes from
    +    begin_odb_transaction() and thus continues the same no-op behavior when
    +    a transaction is already pending. With this model, passing a pending
    +    transaction to end_odb_transaction() ensures it is committed at that
    +    point in time.
     
         Signed-off-by: Justin Tobler <jltobler@gmail•com>
     
    @@ bulk-checkin.c: void fsync_loose_object_bulk_checkin(struct odb_transaction *tra
     -		odb->transaction->odb = odb;
     -	}
     +	if (odb->transaction)
    -+		BUG("ODB transaction already started");
    ++		return NULL;
      
     -	odb->transaction->nesting += 1;
     +	CALLOC_ARRAY(odb->transaction, 1);
    @@ bulk-checkin.h: int index_blob_bulk_checkin(struct odb_transaction *transaction,
     - * to make new objects visible. Transactions can be nested,
     - * and objects are only visible after the outermost transaction
     - * is complete or the transaction is flushed.
    -+ * to make new objects visible. Only a single transaction
    -+ * can be pending at a time and must be ended before
    -+ * beginning another.
    ++ * to make new objects visible. If a transaction is already
    ++ * pending, NULL is returned.
       */
      struct odb_transaction *begin_odb_transaction(struct object_database *odb);
      
    @@ bulk-checkin.h: void flush_odb_transaction(struct odb_transaction *transaction);
      void end_odb_transaction(struct odb_transaction *transaction);
      
     
    - ## cache-tree.c ##
    -@@ cache-tree.c: static int update_one(struct cache_tree *it,
    - 
    - int cache_tree_update(struct index_state *istate, int flags)
    - {
    --	struct odb_transaction *transaction;
    -+	struct odb_transaction *transaction = NULL;
    - 	int skip, i;
    - 
    - 	i = verify_cache(istate, flags);
    -@@ cache-tree.c: int cache_tree_update(struct index_state *istate, int flags)
    - 
    - 	trace_performance_enter();
    - 	trace2_region_enter("cache_tree", "update", the_repository);
    --	transaction = begin_odb_transaction(the_repository->objects);
    -+
    -+	if (!the_repository->objects->transaction)
    -+		transaction = begin_odb_transaction(the_repository->objects);
    -+
    - 	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
    - 		       "", 0, &skip, flags);
    -+
    - 	end_odb_transaction(transaction);
    -+
    - 	trace2_region_leave("cache_tree", "update", the_repository);
    - 	trace_performance_leave("cache_tree_update");
    - 	if (i < 0)
    -
      ## object-file.c ##
     @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
    - 		ret = index_core(istate, oid, fd, xsize_t(st->st_size),
    - 				 type, path, flags);
    - 	} else {
    --		struct odb_transaction *transaction;
    -+		struct odb_transaction *transaction = NULL;
    + 		struct odb_transaction *transaction;
      
    --		transaction = begin_odb_transaction(the_repository->objects);
    + 		transaction = begin_odb_transaction(the_repository->objects);
     -		ret = index_blob_bulk_checkin(transaction,
    -+		if (!the_repository->objects->transaction)
    -+			transaction = begin_odb_transaction(the_repository->objects);
    -+
     +		ret = index_blob_bulk_checkin(the_repository->objects->transaction,
      					      oid, fd, xsize_t(st->st_size),
      					      path, flags);
    -+
      		end_odb_transaction(transaction);
    - 	}
    - 
    -
    - ## read-cache.c ##
    -@@ read-cache.c: int add_files_to_cache(struct repository *repo, const char *prefix,
    - 		       const struct pathspec *pathspec, char *ps_matched,
    - 		       int include_sparse, int flags)
    - {
    --	struct odb_transaction *transaction;
    -+	struct odb_transaction *transaction = NULL;
    - 	struct update_callback_data data;
    - 	struct rev_info rev;
    - 
    -@@ read-cache.c: int add_files_to_cache(struct repository *repo, const char *prefix,
    - 	 * This function is invoked from commands other than 'add', which
    - 	 * may not have their own transaction active.
    - 	 */
    --	transaction = begin_odb_transaction(repo->objects);
    -+	if (!repo->objects->transaction)
    -+		transaction = begin_odb_transaction(repo->objects);
    -+
    - 	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
    -+
    - 	end_odb_transaction(transaction);
    - 
    - 	release_revisions(&rev);
2:  16af9f1169 ! 2:  f24c28ec56 builtin/update-index: end ODB transaction when --verbose is specified
    @@ Commit message
         object directory and migrated to the primary object database upon
         transaction commit.
     
    -    When the --verbose option is specified, each of the following objects is
    -    explicitly flushed via flush_odb_transaction() prior to reporting the
    -    update. Flushing the object database transaction migrates pending
    -    objects to the primary object database without marking the transaction
    -    as complete. This is done so objects are immediately visible to
    -    git-update-index(1) callers using the --verbose option and that rely on
    -    parsing verbose output to know when objects are written.
    +    When the --verbose option is specified, the subsequent set of objects
    +    written are explicitly flushed via flush_odb_transaction() prior to
    +    reporting the update. Flushing the object database transaction migrates
    +    pending objects to the primary object database without marking the
    +    transaction as complete. This is done so objects are immediately visible
    +    to git-update-index(1) callers using the --verbose option and that rely
    +    on parsing verbose output to know when objects are written.
     
    -    Due to how git-update-index(1) parses options, each filename argument is
    -    evaluated with only the set of options that precede it. Therefore, it is
    -    possible for an initial set of objects to be written in a transaction
    -    before a --verbose option is encountered.
    +    Due to how git-update-index(1) parses arguments, options that come after
    +    a filename are not considered during the object update. Therefore, it
    +    may not be known ahead of time whether the --verbose option is present
    +    and thus object writes are considered transactional by default until a
    +    --verbose option is parsed.
     
    -    As soon as the --verbose option is parsed in git-update-index(1), all
    -    subsequent object writes are flushed prior to being reported and thus no
    -    longer benefit from being transactional. Furthermore, the mechanism to
    -    flush a transaction without committing is rather awkward. Drop the call
    -    to flush_odb_transaction() in favor of ending the transaction early when
    -    the --verbose flag is encountered.
    +    Flushing a transaction after individual object writes negates the
    +    benefit of writing objects to a transaction in the first place.
    +    Furthermore, the mechanism to flush a transaction without actually
    +    committing is rather awkward. Drop the call to flush_odb_transaction()
    +    in favor of ending the transaction altogether when the --verbose flag is
    +    encountered. Subsequent object writes occur outside of a transaction and
    +    are therefore immediately visible which matches the current behavior.
     
         Signed-off-by: Justin Tobler <jltobler@gmail•com>
     
3:  ae6199a3c8 = 3:  6268b2934a bulk-checkin: drop flush_odb_transaction()
4:  01f5485441 ! 4:  e5f1d93797 object-file: relocate ODB transaction code
    @@ Commit message
         are only used by the object-file subsystem to manage aspects of a
         transaction implementation specific to the files object source.
     
    -    Relocate all the transaction code in in bulk-checkin to object-file.
    -    This simplifies the exposed transaction interface by reducing it to only
    +    Relocate all the transaction code in bulk-checkin to object-file. This
    +    simplifies the exposed transaction interface by reducing it to only
         {begin,end}_odb_transaction(). Function and type names are adjusted in
         the subsequent commit to better fit the new location.
     
    @@ bulk-checkin.c (deleted)
     -struct odb_transaction *begin_odb_transaction(struct object_database *odb)
     -{
     -	if (odb->transaction)
    --		BUG("ODB transaction already started");
    +-		return NULL;
     -
     -	CALLOC_ARRAY(odb->transaction, 1);
     -	odb->transaction->odb = odb;
    @@ bulk-checkin.h (deleted)
     -/*
     - * Tell the object database to optimize for adding
     - * multiple objects. end_odb_transaction must be called
    -- * to make new objects visible. Only a single transaction
    -- * can be pending at a time and must be ended before
    -- * beginning another.
    +- * to make new objects visible. If a transaction is already
    +- * pending, NULL is returned.
     - */
     -struct odb_transaction *begin_odb_transaction(struct object_database *odb);
     -
    @@ object-file.c: int read_loose_object(struct repository *repo,
     +struct odb_transaction *begin_odb_transaction(struct object_database *odb)
     +{
     +	if (odb->transaction)
    -+		BUG("ODB transaction already started");
    ++		return NULL;
     +
     +	CALLOC_ARRAY(odb->transaction, 1);
     +	odb->transaction->odb = odb;
    @@ object-file.h: int read_loose_object(struct repository *repo,
     +/*
     + * Tell the object database to optimize for adding
     + * multiple objects. end_odb_transaction must be called
    -+ * to make new objects visible. Only a single transaction
    -+ * can be pending at a time and must be ended before
    -+ * beginning another.
    ++ * to make new objects visible. If a transaction is already
    ++ * pending, NULL is returned.
     + */
     +struct odb_transaction *begin_odb_transaction(struct object_database *odb);
     +
5:  333319a63d ! 5:  76eabfb6a1 object-file: update naming from bulk-checkin
    @@ object-file.c: static int index_blob_bulk_checkin(struct odb_transaction *transa
      			return error("cannot seek back");
      	}
     @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
    - 		if (!the_repository->objects->transaction)
    - 			transaction = begin_odb_transaction(the_repository->objects);
    + 		struct odb_transaction *transaction;
      
    + 		transaction = begin_odb_transaction(the_repository->objects);
     -		ret = index_blob_bulk_checkin(the_repository->objects->transaction,
     -					      oid, fd, xsize_t(st->st_size),
     -					      path, flags);
    @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
     +						      oid, fd,
     +						      xsize_t(st->st_size),
     +						      path, flags);
    - 
      		end_odb_transaction(transaction);
      	}
    + 
     @@ object-file.c: void end_odb_transaction(struct odb_transaction *transaction)
      	 */
      	ASSERT(transaction == transaction->odb->transaction);
6:  30759bbd0d ! 6:  9accbfdbf0 odb: add transaction interface
    @@ builtin/update-index.c: int cmd_update_index(int argc,
     
      ## cache-tree.c ##
     @@ cache-tree.c: int cache_tree_update(struct index_state *istate, int flags)
    - 	trace2_region_enter("cache_tree", "update", the_repository);
    - 
    - 	if (!the_repository->objects->transaction)
    --		transaction = begin_odb_transaction(the_repository->objects);
    -+		transaction = odb_transaction_begin(the_repository->objects);
      
    + 	trace_performance_enter();
    + 	trace2_region_enter("cache_tree", "update", the_repository);
    +-	transaction = begin_odb_transaction(the_repository->objects);
    ++	transaction = odb_transaction_begin(the_repository->objects);
      	i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
      		       "", 0, &skip, flags);
    - 
     -	end_odb_transaction(transaction);
     +	odb_transaction_commit(transaction);
    - 
      	trace2_region_leave("cache_tree", "update", the_repository);
      	trace_performance_leave("cache_tree_update");
    + 	if (i < 0)
     
      ## object-file.c ##
     @@ object-file.c: static void prepare_loose_object_transaction(struct odb_transaction *transaction
    @@ object-file.c: static void prepare_loose_object_transaction(struct odb_transacti
      	if (!transaction || transaction->objdir)
      		return;
     @@ object-file.c: int index_fd(struct index_state *istate, struct object_id *oid,
    - 		struct odb_transaction *transaction = NULL;
    - 
    - 		if (!the_repository->objects->transaction)
    --			transaction = begin_odb_transaction(the_repository->objects);
    -+			transaction = odb_transaction_begin(the_repository->objects);
    + 	} else {
    + 		struct odb_transaction *transaction;
      
    +-		transaction = begin_odb_transaction(the_repository->objects);
    ++		transaction = odb_transaction_begin(the_repository->objects);
      		ret = index_blob_packfile_transaction(the_repository->objects->transaction,
      						      oid, fd,
      						      xsize_t(st->st_size),
      						      path, flags);
    - 
     -		end_odb_transaction(transaction);
     +		odb_transaction_commit(transaction);
      	}
    @@ object-file.c: int read_loose_object(struct repository *repo,
     +	struct object_database *odb = source->odb;
     +
      	if (odb->transaction)
    - 		BUG("ODB transaction already started");
    + 		return NULL;
      
     @@ object-file.c: struct odb_transaction *begin_odb_transaction(struct object_database *odb)
      	return odb->transaction;
    @@ object-file.h: struct odb_transaction;
       * Tell the object database to optimize for adding
     - * multiple objects. end_odb_transaction must be called
     + * multiple objects. object_file_transaction_commit must be called
    -  * to make new objects visible. Only a single transaction
    -  * can be pending at a time and must be ended before
    -  * beginning another.
    +  * to make new objects visible. If a transaction is already
    +  * pending, NULL is returned.
       */
     -struct odb_transaction *begin_odb_transaction(struct object_database *odb);
     +struct odb_transaction *object_file_transaction_begin(struct odb_source *source);
    @@ odb.h: struct object_database {
     +/*
     + * Starts an ODB transaction. Subsequent objects are written to the transaction
     + * and not committed until odb_transaction_commit() is invoked on the
    -+ * transaction. Caller are responsible to ensure there is only a single ODB
    -+ * transaction pending at a time.
    ++ * transaction. If the ODB already has a pending transaction, NULL is returned.
     + */
     +struct odb_transaction *odb_transaction_begin(struct object_database *odb);
     +
    @@ odb.h: struct object_database {
     
      ## read-cache.c ##
     @@ read-cache.c: int add_files_to_cache(struct repository *repo, const char *prefix,
    + 	 * This function is invoked from commands other than 'add', which
      	 * may not have their own transaction active.
      	 */
    - 	if (!repo->objects->transaction)
    --		transaction = begin_odb_transaction(repo->objects);
    -+		transaction = odb_transaction_begin(repo->objects);
    - 
    +-	transaction = begin_odb_transaction(repo->objects);
    ++	transaction = odb_transaction_begin(repo->objects);
      	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
    - 
     -	end_odb_transaction(transaction);
     +	odb_transaction_commit(transaction);
      
-- 
2.51.0.193.g4975ec3473b


  parent reply	other threads:[~2025-09-16 18:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 19:11 [PATCH 0/6] odb: add transaction interfaces to ODB subsystem Justin Tobler
2025-09-09 19:11 ` [PATCH 1/6] bulk-checkin: remove ODB transaction nesting Justin Tobler
2025-09-11  6:40   ` Patrick Steinhardt
2025-09-11 15:17     ` Justin Tobler
2025-09-15 23:36     ` Taylor Blau
2025-09-16  2:55       ` Justin Tobler
2025-09-16 16:44         ` Junio C Hamano
2025-09-16 17:47           ` Justin Tobler
2025-09-09 19:11 ` [PATCH 2/6] builtin/update-index: end ODB transaction when --verbose is specified Justin Tobler
2025-09-11  6:40   ` Patrick Steinhardt
2025-09-11 15:34     ` Justin Tobler
2025-09-15  6:08       ` Patrick Steinhardt
2025-09-15 17:08         ` Justin Tobler
2025-09-15 22:03           ` Justin Tobler
2025-09-09 19:11 ` [PATCH 3/6] bulk-checkin: drop flush_odb_transaction() Justin Tobler
2025-09-09 19:11 ` [PATCH 4/6] object-file: relocate ODB transaction code Justin Tobler
2025-09-09 19:11 ` [PATCH 5/6] object-file: update naming from bulk-checkin Justin Tobler
2025-09-09 19:11 ` [PATCH 6/6] odb: add transaction interface Justin Tobler
2025-09-11  6:40   ` Patrick Steinhardt
2025-09-11 16:31     ` Justin Tobler
2025-09-15 20:29 ` [PATCH v2 0/6] odb: add transaction interfaces to ODB subsystem Justin Tobler
2025-09-15 20:29   ` [PATCH v2 1/6] bulk-checkin: remove ODB transaction nesting Justin Tobler
2025-09-16  7:57     ` Karthik Nayak
2025-09-16 15:00       ` Justin Tobler
2025-09-15 20:29   ` [PATCH v2 2/6] builtin/update-index: end ODB transaction when --verbose is specified Justin Tobler
2025-09-16  9:07     ` Karthik Nayak
2025-09-16 15:17       ` Justin Tobler
2025-09-15 20:29   ` [PATCH v2 3/6] bulk-checkin: drop flush_odb_transaction() Justin Tobler
2025-09-15 20:29   ` [PATCH v2 4/6] object-file: relocate ODB transaction code Justin Tobler
2025-09-15 20:29   ` [PATCH v2 5/6] object-file: update naming from bulk-checkin Justin Tobler
2025-09-15 20:29   ` [PATCH v2 6/6] odb: add transaction interface Justin Tobler
2025-09-16 18:29   ` Justin Tobler [this message]
2025-09-16 18:29     ` [PATCH v3 1/6] bulk-checkin: remove ODB transaction nesting Justin Tobler
2025-09-16 18:29     ` [PATCH v3 2/6] builtin/update-index: end ODB transaction when --verbose is specified Justin Tobler
2025-09-16 18:29     ` [PATCH v3 3/6] bulk-checkin: drop flush_odb_transaction() Justin Tobler
2025-09-16 18:29     ` [PATCH v3 4/6] object-file: relocate ODB transaction code Justin Tobler
2025-09-16 18:29     ` [PATCH v3 5/6] object-file: update naming from bulk-checkin Justin Tobler
2025-09-16 18:29     ` [PATCH v3 6/6] odb: add transaction interface Justin Tobler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250916182938.2193476-1-jltobler@gmail.com \
    --to=jltobler@gmail$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=me@ttaylorr$(echo .)com \
    --cc=ps@pks$(echo .)im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox