From: Jeff King <peff@peff•net>
To: "brian m. carlson" <sandals@crustytoothpaste•net>,
git@vger•kernel.org, Stefan Beller <sbeller@google•com>,
Michael Haggerty <mhagger@alum•mit.edu>,
Duy Nguyen <pclouds@gmail•com>
Subject: Re: Pack files, standards compliance, and efficiency
Date: Fri, 5 Jun 2015 05:45:45 -0400 [thread overview]
Message-ID: <20150605094545.GB11855@peff.net> (raw)
In-Reply-To: <20150605014120.GE305479@vauxhall.crustytoothpaste.net>
On Fri, Jun 05, 2015 at 01:41:21AM +0000, brian m. carlson wrote:
> However, with the object_id conversion, we run into a problem: casting
> those unsigned char * values into struct object_id * values is not
> allowed by the C standard. There are two possible solutions: copying;
> and the just-do-it solution, where we cast and hope for the best.
I'm not sure if it does violate the standard. The address of the first
element of a struct is guaranteed to match the address of the struct
itself. The object_id.hash member is an array of unsigned char, so there
are no alignment issues. It might run afoul of rules about casting
between pointer types (i.e., pointers for all types are not guaranteed
to be the same size). The standard dictates that "char *" and "void *"
are the same size, and big enough to hold a pointer to anything, so I
think it might even be OK.
But I'm not even sure that line of thinking is all that interesting.
Even if we are violating some dark corner of the standard, this
definitely falls into the "it's useful and works on all sane machines"
category. We also do much worse things with struct-casting mmap'd data
elsewhere (e.g., see the use of "struct pack_header"). It works fine in
practice as long as you are careful about alignment and padding issues.
So my vote would be to retain the cast. This is very low-level,
performance-sensitive code. I did some very naive timings and didn't see
any measurable change from your patch, but I also don't think we are
seeing a real portability benefit to moving to the copy, so I'd prefer
to keep the status quo.
> It looks like we use the latter in nth_packed_object_offset, where we
> cast an unsigned char * value to uint32_t *, which is clearly not
> allowed.
Yes, this one is definitely dubious by the standard. However, it works
in practice because the index format is designed to be 4-byte aligned.
By contrast, the .bitmap format is not, and we have to use get_be32, etc
(which is really not the end of the world, but I do not think there is
any real reason to change the .idx code at this point).
> I'm to the point where I need to make a decision on how to
> proceed, and I've included a patch with the copying conversion of
> nth_packed_object_sha1 below for comparison. The casting solution is
> obviously more straightforward. I haven't tested either implementation
> for performance.
The test I did was just running "git rev-list --use-bitmap-index --count
HEAD" on a bitmapped linux.git repo. That's where I'd expect it to show
the most, because we are not doing much other work. But I think even
still, the timing is dominated by loading the bitmap file.
-Peff
next prev parent reply other threads:[~2015-06-05 9:45 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 1:41 Pack files, standards compliance, and efficiency brian m. carlson
2015-06-05 9:45 ` Jeff King [this message]
2015-06-05 10:14 ` Duy Nguyen
2015-06-05 10:36 ` Jeff King
2015-06-05 13:24 ` Duy Nguyen
2015-06-05 19:59 ` brian m. carlson
2015-06-05 16:43 ` Junio C Hamano
2015-06-05 15:22 ` Michael Haggerty
2015-06-05 19:42 ` brian m. carlson
2015-06-05 20:03 ` Michael Haggerty
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150605094545.GB11855@peff.net \
--to=peff@peff$(echo .)net \
--cc=git@vger$(echo .)kernel.org \
--cc=mhagger@alum$(echo .)mit.edu \
--cc=pclouds@gmail$(echo .)com \
--cc=sandals@crustytoothpaste$(echo .)net \
--cc=sbeller@google$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox