From: Michael Haggerty <mhagger@alum•mit.edu>
To: Jeff King <peff@peff•net>
Cc: Johan Herland <johan@herland•net>,
Junio C Hamano <gitster@pobox•com>,
git@vger•kernel.org
Subject: Re: [RFC 16/17] object_array_entry: copy name before storing in name field
Date: Mon, 20 May 2013 23:34:13 +0200 [thread overview]
Message-ID: <519A96D5.2070608@alum.mit.edu> (raw)
In-Reply-To: <20130520164458.GA27788@sigill.intra.peff.net>
On 05/20/2013 06:44 PM, Jeff King wrote:
> On Mon, May 20, 2013 at 04:42:38PM +0200, Michael Haggerty wrote:
>
>>>> * Many callers store the empty string ("") as the name; for example,
>>>> most of the entries created during a run of rev-list have "" as
>>>> their name. This means that lots of needless copies of "" are being
>>>> made. I think that the best solution to this problem would be to
>>>> store NULL rather than "" for such entries, but I haven't figured
>>>> out all of the places where the name is used.
>>>
>>> Use strbufs?
>>>
>>> No allocation (except for the strbuf object itself) is needed for
>>> empty strings, and string ownership and be transferred to and from it
>>> to prevent extra copies.
>>
>> That would cost two extra size_t per object_array_entry. I have the
>> feeling that this structure is used often enough that the extra overhead
>> would be a disadvantage, but I'm not sure.
>>
>> The obvious alternative would be to teach users to deal with NULL and
>> either add another constructor alternative that transfers string
>> ownership or *always* transfer string ownership and change the callers
>> to call xstrdup() if they don't already own the name string. I think I
>> will try that approach first.
>
> You could use the same trick that strbuf does: instead of NULL, point to
> a well-known empty string literal. Readers do not have to care about
> this optimization at all; only writers need to recognize the well-known
> pointer value. And since we do not update in place but only eventually
> free, it really is just that anyone calling free() would do "if (name !=
> well_known_empty_string)".
Yes, that sounds like the best solution. Ultimately there is only one
writer, add_object_array_with_mode(), and it can do
if (!name)
entry->name = NULL;
else if (!*name)
entry->name = well_known_empty_string;
else
entry->name = xstrdup(name);
This should be a lot less intrusive than what I was trying: to change
callers who wrote name="" to write name=NULL instead. But it is a
nightmare to find all of the code that reads name and decide whether
they need to do
entry->name ? entry->name : ""
because that in turn depends on whether the code that wrote into the
same object_array always/never/sometimes wrote strings vs. NULL to the
name field. Blech, encapsulation is tough in C.
While I was chasing down callers, I came across other gems like
builtin/checkout.c:
add_pending_object(&revs, object, sha1_to_hex(object->sha1));
revision.c:
add_pending_object(revs, object, sha1_to_hex(object->sha1));
submodule.c:
add_pending_object(rev, &list->item->object,
sha1_to_hex(list->item->object.sha1));
so apparently I wasn't the only one befuddled by the lifetime and
ownership of the name field of object_array_entry.
Michael
--
Michael Haggerty
mhagger@alum•mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2013-05-20 21:34 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-19 20:26 [PATCH 00/17] Remove assumptions about refname lifetimes Michael Haggerty
2013-05-19 20:26 ` [PATCH 01/17] describe: make own copy of refname Michael Haggerty
2013-05-19 20:26 ` [PATCH 02/17] fetch: make own copies of refnames Michael Haggerty
2013-05-19 20:26 ` [PATCH 03/17] add_rev_cmdline(): make a copy of the name argument Michael Haggerty
2013-05-19 20:26 ` [PATCH 04/17] builtin_diff_tree(): make it obvious that function wants two entries Michael Haggerty
2013-05-21 17:27 ` Junio C Hamano
2013-05-23 7:19 ` Michael Haggerty
2013-05-19 20:27 ` [PATCH 05/17] cmd_diff(): use an object_array for holding trees Michael Haggerty
2013-05-21 17:30 ` Junio C Hamano
2013-05-23 7:21 ` Michael Haggerty
2013-05-19 20:27 ` [PATCH 06/17] cmd_diff(): rename local variable "list" -> "entry" Michael Haggerty
2013-05-19 20:27 ` [PATCH 07/17] cmd_diff(): make it obvious which cases are exclusive of each other Michael Haggerty
2013-05-19 20:27 ` [PATCH 08/17] revision: split some overly-long lines Michael Haggerty
2013-05-21 17:34 ` Junio C Hamano
2013-05-23 6:27 ` Michael Haggerty
2013-05-23 17:08 ` Junio C Hamano
2013-05-19 20:27 ` [PATCH 09/17] gc_boundary(): move the check "alloc <= nr" to caller Michael Haggerty
2013-05-21 17:49 ` Junio C Hamano
2013-05-23 7:09 ` Michael Haggerty
2013-05-23 18:02 ` Junio C Hamano
2013-05-19 20:27 ` [PATCH 10/17] get_revision_internal(): make check less mysterious Michael Haggerty
2013-05-21 17:38 ` Junio C Hamano
2013-05-23 6:39 ` Michael Haggerty
2013-05-19 20:27 ` [PATCH 11/17] object_array: add function object_array_filter() Michael Haggerty
2013-05-19 20:27 ` [PATCH 12/17] object_array_remove_duplicates(): rewrite to reduce copying Michael Haggerty
2013-05-19 20:27 ` [PATCH 13/17] fsck: don't put a void*-shaped peg in a char*-shaped hole Michael Haggerty
2013-05-19 20:27 ` [PATCH 14/17] find_first_merges(): initialize merges variable using initializer Michael Haggerty
2013-05-19 20:27 ` [PATCH 15/17] find_first_merges(): remove unnecessary code Michael Haggerty
2013-05-19 20:27 ` [RFC 16/17] object_array_entry: copy name before storing in name field Michael Haggerty
2013-05-20 10:33 ` Johan Herland
2013-05-20 14:42 ` Michael Haggerty
2013-05-20 16:44 ` Jeff King
2013-05-20 21:34 ` Michael Haggerty [this message]
2013-05-19 20:27 ` [RFC 17/17] refs: document the lifetime of the refname passed to each_ref_fn Michael Haggerty
2013-05-20 10:28 ` [PATCH 00/17] Remove assumptions about refname lifetimes Johan Herland
2013-05-20 12:15 ` Michael Haggerty
2013-05-20 16:37 ` Junio C Hamano
2013-05-20 16:59 ` Jeff King
2013-05-20 17:08 ` Johan Herland
2013-05-20 18:03 ` Junio C Hamano
2013-05-20 17:03 ` Johan Herland
2013-05-21 18:39 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=519A96D5.2070608@alum.mit.edu \
--to=mhagger@alum$(echo .)mit.edu \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
--cc=johan@herland$(echo .)net \
--cc=peff@peff$(echo .)net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox