From: Jeff King <peff@peff•net>
To: Taylor Blau <me@ttaylorr•com>
Cc: git@vger•kernel.org, dstolee@microsoft•com
Subject: Re: [PATCH 4/4] midx: report checksum mismatches during 'verify'
Date: Thu, 24 Jun 2021 16:10:26 -0400 [thread overview]
Message-ID: <YNTmsoOKMsaC+cYV@coredump.intra.peff.net> (raw)
In-Reply-To: <94e9de44e3b52513c5ab48aecd74f809dc34cbe3.1624473543.git.me@ttaylorr.com>
On Wed, Jun 23, 2021 at 02:39:15PM -0400, Taylor Blau wrote:
> 'git multi-pack-index verify' inspects the data in an existing MIDX for
> correctness by checking that the recorded object offsets are correct,
> and so on.
>
> But it does not check that the file's trailing checksum matches the data
> that it records. So, if an on-disk corruption happened to occur in the
> final few bytes (and all other data was recorded correctly), we would:
>
> - get a clean result from 'git multi-pack-index verify', but
> - be unable to reuse the existing MIDX when writing a new one (since
> we now check for checksum mismatches before reusing a MIDX)
>
> Teach the 'verify' sub-command to recognize corruption in the checksum
> by calling midx_checksum_valid().
Makes sense. I was a little surprised we didn't do this already, but I
guess it does not do the same "regenerate and make sure hashfile
produces the same checksum" trick that the pack idx verifier does (as an
aside, I think what the midx code is doing here is much _better_,
because it is looking at semantic problems in the file, and is more
robust against irrelevant changes in the format).
> @@ -1228,6 +1228,9 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
> return result;
> }
>
> + if (!midx_checksum_valid(m))
> + midx_report(_("incorrect checksum"));
This "midx_report()" function doesn't provide much context on stderr. I
get:
$ echo foo >>.git/objects/pack/multi-pack-index
$ git multi-pack-index verify
incorrect checksum
Verifying OID order in multi-pack-index: 100% (282/282), done.
Sorting objects by packfile: 100% (283/283), done.
Verifying object offsets: 100% (283/283), done.
I think we should at least say "error:", but something along the lines
of "midx file at %s does not match its trailing checksum (possibly
corruption?)". Or something like that.
I think all of the existing calls to midx_report() share this issue,
though. We probably want to at least say "error:" here, but maybe even
something like:
diff --git a/midx.c b/midx.c
index 9a35b0255d..e464907a7c 100644
--- a/midx.c
+++ b/midx.c
@@ -1172,10 +1172,12 @@ void clear_midx_file(struct repository *r)
static int verify_midx_error;
-static void midx_report(const char *fmt, ...)
+static void midx_report(struct multi_pack_index *m, const char *fmt, ...)
{
va_list ap;
verify_midx_error = 1;
+ /* do we need to care about the "next" pointer here? */
+ fprintf(stderr, ("error: %s/multi-pack-index: "), m->object_dir);
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
fprintf(stderr, "\n");
Also, a side note: we should use __attribute__((format)) on this
function to get compile-time checks of our format strings.
-Peff
next prev parent reply other threads:[~2021-06-24 20:10 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-23 18:39 [PATCH 0/4] midx: verify MIDX checksum before reusing Taylor Blau
2021-06-23 18:39 ` [PATCH 1/4] csum-file: introduce checksum_valid() Taylor Blau
2021-06-24 19:42 ` Jeff King
2021-06-23 18:39 ` [PATCH 2/4] commit-graph: rewrite to use checksum_valid() Taylor Blau
2021-06-24 19:42 ` Jeff King
2021-06-23 18:39 ` [PATCH 3/4] midx: don't reuse corrupt MIDXs when writing Taylor Blau
2021-06-24 20:00 ` Jeff King
2021-06-23 18:39 ` [PATCH 4/4] midx: report checksum mismatches during 'verify' Taylor Blau
2021-06-24 4:22 ` Bagas Sanjaya
2021-06-24 20:10 ` Jeff King [this message]
2021-11-10 23:11 ` SZEDER Gábor
2021-11-11 10:05 ` Jeff King
2021-11-16 21:10 ` Taylor Blau
2021-11-16 21:38 ` [PATCH] t5319: corrupt more bytes of the midx checksum Jeff King
2021-11-16 21:43 ` Taylor Blau
2021-11-16 22:12 ` Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YNTmsoOKMsaC+cYV@coredump.intra.peff.net \
--to=peff@peff$(echo .)net \
--cc=dstolee@microsoft$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
--cc=me@ttaylorr$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox