From: Samuel Bronson <naesten@gmail•com>
To: git@vger•kernel.org
Subject: Re: [BUG] parse_object() does not behave as documented
Date: Sat, 09 Aug 2014 01:43:38 -0400 [thread overview]
Message-ID: <87sil6w73p.fsf@naesten.mooo.com> (raw)
In-Reply-To: <87ha1zzhwu.fsf_-_@naesten.mooo.com> (Samuel Bronson's message of "Tue, 29 Jul 2014 22:42:25 -0400")
Ping?
Samuel Bronson <naesten@gmail•com> writes:
> [Hmm, nobody seems ot have commented on this analysis; maybe reposting
> it with a subject containing [BUG] will help?]
>
> Samuel Bronson <naesten@gmail•com> writes:
>
>> The following message is a courtesy copy of an article
>> that has been posted to gmane.comp.version-control.git as well.
>>
>> Oh, I forgot to provide any analysis of the problem. Oops.
>>
>> It may be just as well, though; I was tired enough that I might have
>> botched it in any case. So, have an analysis:
>>
>> While inflate errors are obviously NOT GOOD, and should perhaps be
>> instantly fatal for most commands, git fsck is something of a special
>> case because it is useful to have *it* report as many corrupt objects as
>> possible in one run.
>>
>> Unfortunately, this is not currently the case, as shown by the provided
>> testcase.
>>
>> The output for this testcase is:
>>
>> ,----
>> | checking known breakage:
>> | hash1=ffffffffffffffffffffffffffffffffffffffff &&
>> | hash2=fffffffffffffffffffffffffffffffffffffffe &&
>> | mkdir -p .git/objects/ff &&
>> | echo not-zlib >$(sha1_file $hash1) &&
>> | test_when_finished "remove_object $hash1" &&
>> | echo not-zlib >$(sha1_file $hash2) &&
>> | test_when_finished "remove_object $hash2" &&
>> |
>> | # Return value is not documented
>> | test_might_fail git fsck 2>out &&
>> | cat out && echo ====== &&
>> | grep "$hash1.*corrupt" out &&
>> | grep "$hash2.*corrupt" out
>> |
>> | error: inflate: data stream error (incorrect header check)
>> | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>> | error: inflate: data stream error (incorrect header check)
>> | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>> | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>> | corrupt
>> | ======
>> | fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>> | in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>> | corrupt
>> | not ok 5 - fsck survives inflate errors # TODO known breakage
>> `----
>>
>> If I flip it from expect_failure to expect_success and run the test with
>> -i, then go into the trash directory and run "gdb ../../git-fsck", I can
>> obtain this (thoroughly rehearsed & trimmed) gdb transcript:
>>
>> ,----
>> | % gdb ../../git-fsck
>> | GNU gdb (Debian 7.7.1-3) 7.7.1
>> ...
>> | Reading symbols from ../../git-fsck...done.
>> | (gdb) break error
>> | Breakpoint 1 at 0x813d24c: file usage.c, line 143.
>> | (gdb) break die
>> | Breakpoint 2 at 0x813d152: file usage.c, line 94.
>> | (gdb) run
>> | Starting program: /home/naesten/hacking/git/git-fsck
>> | [Thread debugging using libthread_db enabled]
>> | Using host libthread_db library
>> | "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
>> | Checking object directories: 100% (256/256), done.
>> |
>> | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | 143 {
>> | (gdb) bt
>> | #0 error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | #1 0x081452ff in git_inflate (strm=0xbfffe6b8, flush=0)
>> | at zlib.c:144
>> | #2 0x08125367 in unpack_sha1_header (stream=<optimized out>,
>> | map=<optimized out>, mapsize=<optimized out>,
>> | buffer=<optimized out>, bufsiz=<optimized out>)
>> | at sha1_file.c:1515
>> | #3 0x08125546 in sha1_loose_object_info (
>> | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
>> | oi=oi@entry=0xbfffe788) at sha1_file.c:2528
>> | #4 0x08126b2d in sha1_object_info_extended (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
>> | at sha1_file.c:2565
>> | #5 0x0812666f in sha1_object_info (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> | at sha1_file.c:2601
>> | #6 0x080f6941 in parse_object (
>> | sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
>> | #7 0x080758ac in fsck_sha1 (
>> | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>)
>> | at builtin/fsck.c:333
>> ...
>> | (gdb) c
>> | Continuing.
>> | error: inflate: data stream error (incorrect header check)
>> |
>> | Breakpoint 1, error (err=0x817c525 "unable to unpack %s header")
>> | at usage.c:143
>> | 143 {
>> | (gdb) bt
>> | #0 error (err=0x817c525 "unable to unpack %s header") at usage.c:143
>> | #1 0x08125564 in sha1_loose_object_info (
>> | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
>> | oi=oi@entry=0xbfffe788) at sha1_file.c:2529
>> | #2 0x08126b2d in sha1_object_info_extended (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, oi=0xbfffe788, flags=1)
>> | at sha1_file.c:2565
>> | #3 0x0812666f in sha1_object_info (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> | at sha1_file.c:2601
>> | #4 0x080f6941 in parse_object (
>> | sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
>> ...
>> | (gdb) frame 4
>> | #4 0x080f6941 in parse_object (
>> | sha1=0x82659d4 '\377' <repeats 20 times>) at object.c:247
>> | warning: Source file is more recent than executable.
>> | 247 sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
>> | (gdb) down
>> | #3 0x0812666f in sha1_object_info (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> | at sha1_file.c:2601
>> | 2601 if (sha1_object_info_extended(sha1, &oi, LOOKUP_REPLACE_OBJECT)
>> | < 0)
>> | (gdb) fin
>> | Run till exit from #3 0x0812666f in sha1_object_info (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, sizep=0x0)
>> | at sha1_file.c:2601
>> | error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>> | parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> | at object.c:246
>> | 246 (!obj && has_sha1_file(sha1) &&
>> | Value returned is $1 = -1
>> | (gdb) c
>> | Continuing.
>> |
>> | Breakpoint 1, error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | 143 {
>> | (gdb) bt
>> | #0 error (err=0x8182f7a "inflate: %s (%s)") at usage.c:143
>> | #1 0x081452ff in git_inflate (strm=0xbfffe710, flush=0)
>> | at zlib.c:144
>> | #2 0x08125367 in unpack_sha1_header (stream=<optimized out>,
>> | map=<optimized out>, mapsize=<optimized out>,
>> | buffer=<optimized out>, bufsiz=<optimized out>)
>> | at sha1_file.c:1515
>> | #3 0x08125429 in unpack_sha1_file (map=map@entry=0xb7fdc000,
>> | mapsize=<optimized out>, type=type@entry=0xbfffe7d8,
>> | size=0xbfffe7dc, sha1=0x82659d4 '\377' <repeats 20 times>)
>> | at sha1_file.c:1620
>> | #4 0x08126024 in read_object (
>> | sha1=sha1@entry=0x82659d4 '\377' <repeats 20 times>,
>> | type=type@entry=0xbfffe7d8, size=size@entry=0xbfffe7dc)
>> | at sha1_file.c:2667
>> | #5 0x08126c33 in read_sha1_file_extended (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
>> | size=0xbfffe7dc, flag=1) at sha1_file.c:2690
>> | #6 0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
>> | sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
>> | #7 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> | at object.c:256
>> ...
>> | (gdb) c
>> | Continuing.
>> | error: inflate: data stream error (incorrect header check)
>> |
>> | Breakpoint 2, die (
>> | err=0x817cdbc "loose object %s (stored in %s) is corrupt")
>> | at usage.c:94
>> | 94 {
>> | (gdb) bt
>> | #0 die (err=0x817cdbc "loose object %s (stored in %s) is corrupt")
>> | at usage.c:94
>> | #1 0x08126cc1 in read_sha1_file_extended (
>> | sha1=0x82659d4 '\377' <repeats 20 times>, type=0xbfffe7d8,
>> | size=0xbfffe7dc, flag=1) at sha1_file.c:2705
>> | #2 0x080f69b8 in read_sha1_file (size=0xbfffe7dc, type=0xbfffe7d8,
>> | sha1=0x82659d4 '\377' <repeats 20 times>) at cache.h:853
>> | #3 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> | at object.c:256
>> ...
>> | (gdb) frame 3
>> | #3 parse_object (sha1=0x82659d4 '\377' <repeats 20 times>)
>> | at object.c:256
>> | 256 buffer = read_sha1_file(sha1, &type, &size); // <-- death
>> | (gdb)
>> `----
>>
>> It's probably clearest what's happened here if I just show you this ...
>>
>> From object.c:
>>
>> struct object *parse_object(const unsigned char *sha1)
>> {
>> unsigned long size;
>> enum object_type type;
>> int eaten;
>> const unsigned char *repl = lookup_replace_object(sha1);
>> void *buffer;
>> struct object *obj;
>>
>> obj = lookup_object(sha1);
>> if (obj && obj->parsed)
>> return obj;
>>
>> if ((obj && obj->type == OBJ_BLOB) ||
>> (!obj && has_sha1_file(sha1) &&
>> sha1_object_info(sha1, NULL) == OBJ_BLOB)) { // <-- first error
>> if (check_sha1_signature(repl, NULL, 0, NULL) < 0) {
>> error("sha1 mismatch %s", sha1_to_hex(repl));
>> return NULL;
>> }
>> parse_blob_buffer(lookup_blob(sha1), NULL, 0);
>> return lookup_object(sha1);
>> }
>>
>> buffer = read_sha1_file(sha1, &type, &size); // <-- death
>> if (buffer) {
>> if (check_sha1_signature(repl, buffer, size, typename(type)) < 0) {
>> free(buffer);
>> error("sha1 mismatch %s", sha1_to_hex(repl));
>> return NULL;
>> }
>>
>> obj = parse_object_buffer(sha1, type, size, buffer, &eaten);
>> if (!eaten)
>> free(buffer);
>> return obj;
>> }
>> return NULL;
>> }
>>
>> (I've clearly added some marginal comments to my copy. ;-)
>>
>> The first two lines of output
>>
>>> error: inflate: data stream error (incorrect header check)
>>> error: unable to unpack ffffffffffffffffffffffffffffffffffffffff header
>>
>> come from deep within the call "sha1_object_info(sha1, NULL)".
>>
>> The next two lines
>>
>>> error: inflate: data stream error (incorrect header check)
>>> fatal: loose object ffffffffffffffffffffffffffffffffffffffff (stored
>>> in .git/objects/ff/ffffffffffffffffffffffffffffffffffffff) is
>>> corrupt
>>
>> and death come from the call "read_sha1_file(sha1, &type, &size)", which
>> is fair because that's exactly what the documentation comment above
>> read_sha1_file_extended() *says* will happen:
>>
>> /*
>> * This function dies on corrupt objects; the callers who want to
>> * deal with them should arrange to call read_object() and give error
>> * messages themselves.
>> */
>>
>> So, given that parse_object()'s documentation is:
>>
>> /*
>> * Returns the object, having parsed it to find out what it is.
>> *
>> * Returns NULL if the object is missing or corrupt.
>> */
>>
>> it probably should not call read_sha1_file() on a corrupt object.
>>
>> Options for fixing this would appear to include:
>>
>> 1. Saving the result of sha1_object_info(sha1, NULL) to a variable and
>> returning early if the object is corrupt. (But what happens if there
>> is corruption far enough in that it isn't seen when trying to grab
>> the object header?)
>>
>> 2. Calling read_object() and giving our own error messages.
>>
>> 3. Making read_sha1_file_extended only *optionally* die; since it's
>> calling die() directly.
>>
>> I'm also a bit nervous about this, though:
>>
>> static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
>> {
>> return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
>> }
>>
>> Do we really want that happening while scanning the loose objects?
--
Hi! I'm a .signature virus! Copy me into your ~/.signature to help me spread!
next prev parent reply other threads:[~2014-08-09 5:43 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-20 4:43 [PATCH] Add failing test: "fsck survives inflate errors" Samuel Bronson
2014-07-20 20:43 ` Samuel Bronson
2014-07-30 2:42 ` [BUG] parse_object() does not behave as documented Samuel Bronson
2014-08-09 5:43 ` Samuel Bronson [this message]
2014-08-09 6:53 ` [PATCH] Add failing test: "fsck survives inflate errors" Duy Nguyen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sil6w73p.fsf@naesten.mooo.com \
--to=naesten@gmail$(echo .)com \
--cc=git@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox