public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jeff King <peff@peff•net>
To: Taylor Blau <me@ttaylorr•com>
Cc: git@vger•kernel.org, dstolee@microsoft•com
Subject: Re: [PATCH 4/4] midx: report checksum mismatches during 'verify'
Date: Thu, 24 Jun 2021 16:10:26 -0400	[thread overview]
Message-ID: <YNTmsoOKMsaC+cYV@coredump.intra.peff.net> (raw)
In-Reply-To: <94e9de44e3b52513c5ab48aecd74f809dc34cbe3.1624473543.git.me@ttaylorr.com>

On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote:

> 'git multi-pack-index verify' inspects the data in an existing MIDX for
> correctness by checking that the recorded object offsets are correct,
> and so on.
> 
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
> 
>   - get a clean result from 'git multi-pack-index verify', but
>   - be unable to reuse the existing MIDX when writing a new one (since
>     we now check for checksum mismatches before reusing a MIDX)
> 
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().

Makes sense. I was a little surprised we didn't do this already, but I
guess it does not do the same "regenerate and make sure hashfile
produces the same checksum" trick that the pack idx verifier does (as an
aside, I think what the midx code is doing here is much _better_,
because it is looking at semantic problems in the file, and is more
robust against irrelevant changes in the format).

> @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
>  		return result;
>  	}
>  
> +	if (!midx_checksum_valid(m))
> +		midx_report(_("incorrect checksum"));

This "midx_report()" function doesn't provide much context on stderr. I
get:

  $ echo foo >>.git/objects/pack/multi-pack-index
  $ git multi-pack-index verify
  incorrect checksum
  Verifying OID order in multi-pack-index: 100% (282/282), done.
  Sorting objects by packfile: 100% (283/283), done.
  Verifying object offsets: 100% (283/283), done.

I think we should at least say "error:", but something along the lines
of "midx file at %s does not match its trailing checksum (possibly
corruption?)". Or something like that.

I think all of the existing calls to midx_report() share this issue,
though. We probably want to at least say "error:" here, but maybe even
something like:

diff --git a/midx.c b/midx.c
index 9a35b0255d..e464907a7c 100644
--- a/midx.c
+++ b/midx.c
@@ -1172,10 +1172,12 @@ void clear_midx_file(struct repository *r)
 
 static int verify_midx_error;
 
-static void midx_report(const char *fmt, ...)
+static void midx_report(struct multi_pack_index *m, const char *fmt, ...)
 {
 	va_list ap;
 	verify_midx_error = 1;
+	/* do we need to care about the "next" pointer here? */
+	fprintf(stderr, ("error: %s/multi-pack-index: "), m->object_dir);
 	va_start(ap, fmt);
 	vfprintf(stderr, fmt, ap);
 	fprintf(stderr, "\n");

Also, a side note: we should use __attribute__((format)) on this
function to get compile-time checks of our format strings.

-Peff

  parent reply	other threads:[~2021-06-24 20:10 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-23 18:39 [PATCH 0/4] midx: verify MIDX checksum before reusing Taylor Blau
2021-06-23 18:39 ` [PATCH 1/4] csum-file: introduce checksum_valid() Taylor Blau
2021-06-24 19:42   ` Jeff King
2021-06-23 18:39 ` [PATCH 2/4] commit-graph: rewrite to use checksum_valid() Taylor Blau
2021-06-24 19:42   ` Jeff King
2021-06-23 18:39 ` [PATCH 3/4] midx: don't reuse corrupt MIDXs when writing Taylor Blau
2021-06-24 20:00   ` Jeff King
2021-06-23 18:39 ` [PATCH 4/4] midx: report checksum mismatches during 'verify' Taylor Blau
2021-06-24  4:22   ` Bagas Sanjaya
2021-06-24 20:10   ` Jeff King [this message]
2021-11-10 23:11   ` SZEDER Gábor
2021-11-11 10:05     ` Jeff King
2021-11-16 21:10       ` Taylor Blau
2021-11-16 21:38         ` [PATCH] t5319: corrupt more bytes of the midx checksum Jeff King
2021-11-16 21:43           ` Taylor Blau
2021-11-16 22:12           ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YNTmsoOKMsaC+cYV@coredump.intra.peff.net \
    --to=peff@peff$(echo .)net \
    --cc=dstolee@microsoft$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=me@ttaylorr$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox