public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks•im>
To: Karthik Nayak <karthik.188@gmail•com>
Cc: git@vger•kernel.org, gitster@pobox•com, shejialuo@gmail•com
Subject: Re: [PATCH v3 7/8] reftable: add code to facilitate consistency checks
Date: Wed, 24 Sep 2025 07:54:48 +0200	[thread overview]
Message-ID: <aNOHqEq5qxXrOCX7@pks.im> (raw)
In-Reply-To: <20250918-228-reftable-introduce-consistency-checks-v3-7-271af03eb34d@gmail.com>

On Thu, Sep 18, 2025 at 10:11:48AM +0200, Karthik Nayak wrote:
> diff --git a/reftable/fsck.c b/reftable/fsck.c
> new file mode 100644
> index 0000000000..785e4b43e8
> --- /dev/null
> +++ b/reftable/fsck.c
> @@ -0,0 +1,112 @@
> +#include "basics.h"
> +#include "reftable-fsck.h"
> +#include "stack.h"
> +
> +static bool valid_table_name(const char *name, uint64_t *min_update_index,
> +			     uint64_t *max_update_index)
> +{
> +	const char *ptr = name;
> +	char *endptr;
> +
> +	/* strtoull doesn't set errno on success */
> +	errno = 0;
> +
> +	*min_update_index = strtoull(ptr, &endptr, 16);
> +	if (errno == EINVAL)
> +		return false;

strtoull may also return ERANGE. In general, shouldn't we abort whenever
errno is non-zero here?

> +	ptr = endptr;
> +
> +	if (strncmp(ptr, "-", 1))
> +		return false;

Better:

    if (*ptr != '-')
        return false;

> +	ptr++;
> +
> +	*max_update_index = strtoull(ptr, &endptr, 16);
> +	if (errno == EINVAL)
> +		return false;
> +	ptr = endptr;
> +
> +	if (*ptr != '-')
> +		return false;
> +	ptr++;
> +
> +	strtoul(ptr, &endptr, 16);
> +	if (errno == EINVAL)
> +		return false;
> +	ptr = endptr;
> +
> +	if (strcmp(ptr, ".ref") && strcmp(ptr, ".log"))
> +		return false;

Yup, makes sense. We don't do so ourselves, but in theory it is possible
for tables to have a ".log" suffix. If so, they are expected to only
contain reflog records.

> +	return true;
> +}
> +
> +static int stack_check_all_files_in_dir(struct reftable_stack *stack,
> +					reftable_fsck_report_fn report_fn,
> +					void *cb_data)
> +{
> +	DIR *dir = opendir(stack->reftable_dir);

I think it would make sense to move this function call close to the
conditional.

> +	struct reftable_fsck_info info;
> +	struct dirent *d = NULL;
> +	uint64_t min, max;
> +	int err = 0;
> +
> +	if (!dir)
> +		return 0;
> +
> +	while ((d = readdir(dir))) {
> +		if (!strcmp(d->d_name, "tables.list"))
> +			continue;
> +
> +		if ((d->d_name[0] == '.' &&
> +		     (d->d_name[1] == '\0' ||
> +		      (d->d_name[1] == '.' && d->d_name[2] == '\0'))))
> +			continue;
> +
> +		if (d->d_type == DT_REG) {
> +			if (!valid_table_name(d->d_name, &min, &max)) {
> +				info.error = REFTABLE_FSCK_ERROR_TABLE_NAME;
> +				info.msg = "file with invalid table name";
> +				info.path = d->d_name;
> +
> +				err |= report_fn(&info, cb_data);
> +			}

One problem with this is that this is racy with concurrent writers. We
don't recognize the "tables.list.lock" file, and neither do we recognize
"0x*-0x*.{ref,log}.temp.XXXXXX"-style files.

Would it be a better approach be to instead go through table names as
loaded by the stack? The reftable code already knows to prune unknown
files anyway, so I don't think we should scan for any other files.

> +		} else {
> +			info.error = REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE;
> +			info.msg = "file with unexpected type";
> +			info.path = d->d_name;
> +
> +			err |= report_fn(&info, cb_data);
> +		}
> +	}
> +
> +	closedir(dir);
> +	return err;
> +}
> +
> +static int stack_checks(struct reftable_stack *stack,
> +			reftable_fsck_report_fn report_fn,
> +			void *cb_data)
> +{
> +	struct reftable_buf msg = REFTABLE_BUF_INIT;
> +	char **names = NULL;

This variable is unused.

> +	int err = 0;
> +
> +	if (stack == NULL)
> +		goto out;

Why should someone ever pass a `NULL` stack?

> +	err |= stack_check_all_files_in_dir(stack, report_fn, cb_data);
> +
> +out:
> +	free_names(names);
> +	reftable_buf_release(&msg);
> +	return err;
> +}
> +
> +int reftable_fsck_check(struct reftable_stack *stack,
> +			reftable_fsck_report_fn report_fn,
> +			reftable_fsck_verbose_fn verbose_fn,
> +			void *cb_data)
> +{
> +	verbose_fn("Checking reftable: stack checks", cb_data);
> +	return stack_checks(stack, report_fn, cb_data);

Nit: having this extra function call to `stack_checks()` feels a bit
weird as it could just as well be inlined. Is this preparing for a
future change?

> +}
> diff --git a/reftable/reftable-fsck.h b/reftable/reftable-fsck.h
> new file mode 100644
> index 0000000000..5e13ac9f02
> --- /dev/null
> +++ b/reftable/reftable-fsck.h
> @@ -0,0 +1,42 @@
> +#ifndef REFTABLE_FSCK_H
> +#define REFTABLE_FSCK_H
> +
> +#include "reftable-stack.h"
> +
> +enum reftable_fsck_error {
> +	/* Non regular file in the reftable directory */
> +	REFTABLE_FSCK_ERROR_INVALID_FILE_TYPE = 0,
> +	/* Invalid table name */
> +	REFTABLE_FSCK_ERROR_TABLE_NAME,
> +	/* Used for bounds checking, must be last */
> +	REFTABLE_FSCK_MAX_VALUE

Let's add a trailing comma here.

Patrick

  reply	other threads:[~2025-09-24  5:54 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-19 12:20 [PATCH 0/5] refs/reftable: add fsck checks Karthik Nayak
2025-08-19 12:21 ` [PATCH 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-08-19 12:21 ` [PATCH 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-08-26 16:21   ` shejialuo
2025-09-01 13:33     ` Karthik Nayak
2025-09-03 13:39       ` shejialuo
2025-08-19 12:21 ` [PATCH 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-08-26 16:33   ` shejialuo
2025-09-01 13:40     ` Karthik Nayak
2025-08-26 16:44   ` shejialuo
2025-09-01 13:52     ` Karthik Nayak
2025-08-19 12:21 ` [PATCH 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-08-19 12:21 ` [PATCH 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-08-26 16:39 ` [PATCH 0/5] refs/reftable: add fsck checks shejialuo
2025-09-01 13:52   ` Karthik Nayak
2025-09-02  7:05 ` [PATCH v2 " Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 1/5] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-02 22:25     ` Junio C Hamano
2025-09-08 13:00       ` Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 2/5] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-03  8:07     ` Patrick Steinhardt
2025-09-03 16:51       ` shejialuo
2025-09-09 13:49         ` Karthik Nayak
2025-09-09  8:42       ` Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 3/5] refs/reftable: add fsck check for number of tables Karthik Nayak
2025-09-03  8:07     ` Patrick Steinhardt
2025-09-15  9:27       ` Karthik Nayak
2025-09-02  7:05   ` [PATCH v2 4/5] refs/reftable: add fsck check for trailing newline Karthik Nayak
2025-09-02 22:38     ` Junio C Hamano
2025-09-03  8:07       ` Patrick Steinhardt
2025-09-02  7:05   ` [PATCH v2 5/5] refs/reftable: add fsck check for incorrect update index Karthik Nayak
2025-09-02 22:42     ` Junio C Hamano
2025-09-18  8:11       ` Karthik Nayak
2025-09-18  8:11 ` [PATCH v3 0/8] refs/reftable: add consistency checks Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 1/8] refs: remove unused headers Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 2/8] refs: move consistency check msg to generic layer Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 3/8] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-09-18 15:36     ` Junio C Hamano
2025-09-23 15:42       ` Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 10:02       ` Karthik Nayak
2025-09-24  7:24     ` Kristoffer Haugsbakk
2025-09-24 11:06       ` Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 4/8] reftable: ensure tables in a stack use sequential update indices Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 11:20       ` Karthik Nayak
2025-09-24 18:04         ` Junio C Hamano
2025-09-24 20:13           ` Karthik Nayak
2025-09-25  6:12             ` Patrick Steinhardt
2025-09-25 16:22               ` Junio C Hamano
2025-09-18  8:11   ` [PATCH v3 5/8] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 6/8] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-18  8:11   ` [PATCH v3 7/8] reftable: add code to facilitate consistency checks Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt [this message]
2025-09-24 18:40       ` Karthik Nayak
2025-09-25  6:14         ` Patrick Steinhardt
2025-09-18  8:11   ` [PATCH v3 8/8] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-09-24  5:54     ` Patrick Steinhardt
2025-09-24 18:44       ` Karthik Nayak
2025-09-26  7:25 ` [PATCH v4 0/7] refs/reftable: add consistency checks Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 1/7] refs: remove unused headers Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-02 11:44     ` Patrick Steinhardt
2025-10-06 12:02       ` Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-09-26  7:25   ` [PATCH v4 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-02 11:44     ` Patrick Steinhardt
2025-09-26  7:25   ` [PATCH v4 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-02 11:44     ` Patrick Steinhardt
2025-10-06 12:05       ` Karthik Nayak
2025-09-26 21:08   ` [PATCH v4 0/7] refs/reftable: add consistency checks Junio C Hamano
2025-10-06 14:22 ` [PATCH v5 " Karthik Nayak
2025-10-06 14:22   ` [PATCH v5 1/7] refs: remove unused headers Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-06 14:23   ` [PATCH v5 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07  2:32     ` Jeff King
2025-10-07  8:45       ` Karthik Nayak
2025-10-06 22:08   ` [PATCH v5 0/7] refs/reftable: add consistency checks Junio C Hamano
2025-10-07  8:47     ` Karthik Nayak
2025-10-07 15:11       ` Junio C Hamano
2025-10-07 12:11 ` [PATCH v6 " Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 1/7] refs: remove unused headers Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 2/7] refs: move consistency check msg to generic layer Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 3/7] reftable: check for trailing newline in 'tables.list' Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 4/7] Documentation/fsck-msgids: remove duplicate msg id Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 5/7] fsck: order 'fsck_msg_type' alphabetically Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 6/7] reftable: add code to facilitate consistency checks Karthik Nayak
2025-10-07 12:11   ` [PATCH v6 7/7] refs/reftable: add fsck check for checking the table name Karthik Nayak
2025-10-07 13:26   ` [PATCH v6 0/7] refs/reftable: add consistency checks Patrick Steinhardt
2025-10-07 16:25     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aNOHqEq5qxXrOCX7@pks.im \
    --to=ps@pks$(echo .)im \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    --cc=karthik.188@gmail$(echo .)com \
    --cc=shejialuo@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox