From: Michael Haggerty <mhagger@alum•mit.edu>
To: Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 1/4] Add a new function, string_list_split_in_place()
Date: Mon, 10 Sep 2012 13:48:53 +0200 [thread overview]
Message-ID: <504DD3A5.8000201@alum.mit.edu> (raw)
In-Reply-To: <7vhar6pgxs.fsf@alter.siamese.dyndns.org>
On 09/10/2012 07:47 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum•mit.edu> writes:
>
>> ... Consider something like
>>
>> struct string_list *split_file_into_words(FILE *f)
>> {
>> char buf[1024];
>> struct string_list *list = new string list;
>> list->strdup_strings = 1;
>> while (not EOF) {
>> read_line_into_buf();
>> string_list_split_in_place(list, buf, ' ', -1);
>> }
>> return list;
>> }
>
> That is a prime example to argue that string_list_split() would make
> more sense, no? The caller does _not_ mind if the function mucks
> with buf, but the resulting list is not allowed to point into buf.
>
> In such a case, the caller shouldn't have to _care_ if it wants to
> allow buf to be mucked with; it is already asking that the resulting
> list _not_ point into buf by setting strdup_strings (by the way,
> that is part of the function input, so think of it like various *opt
> variables passed into functions to tweak their behaviour). If the
> implementation can do so without sacrificing performance (and in
> this case, as you said, it can), it should take "const char *buf".
You're right, I was thinking that a caller of
string_list_split_in_place() could choose to remain ignorant of whether
strdup_strings is set, but this is incorrect because it needs to know
whether to do its own memory management of the strings that it passes
into string_list_append().
> So it appears to me that sl_split_in_place(), if implemented, should
> be kept as a special case for performance-minded callers that have
> full control of the lifetime rules of the variables they use, can
> set strdup_strings to false, and can let buf modified in place, and
> can accept list that point into buf.
OK, so the bottom line would be to have two versions of the function.
One takes a (const char *) and *requires* strdup_strings to be set on
its input list:
int string_list_split(struct string_list *list, const char *string,
int delim, int maxsplit)
{
assert(list->strdup_strings);
...
}
The other takes a (char *) and modifies it in-place, and maybe even
requires strdup_strings to be false on its input list:
int string_list_split_in_place(struct string_list *list, char *string,
int delim, int maxsplit)
{
/* not an error per se but a strong suggestion of one: */
assert(!list->strdup_strings);
...
}
(The latter (modulo assert) is the one that I have implemented, but it
might not be needed immediately.) Do you agree?
>>>> + * Examples:
>>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"]
>>>> + * string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"]
>>>> + * string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""]
>>>
>>> I would find it more natural to see a sentinel value against
>>> "positive" to be 0, not -1. "-1" gives an impression as if "-2"
>>> might do something different from "-1", but Zero is a lot more
>>> special.
>>
>> You have raised a good point and I think there is a flaw in the API, but
>> I'm not sure I agree with you what the flaw is...
>>
>> The "maxsplit" argument limits the number of times the string should be
>> split. I.e., if maxsplit is set, then the output will have at most
>> (maxsplit + 1) strings.
>
> So "do not split, just give me the whole thing" would be maxsplit == 0
> to split into (maxsplit+1) == 1 string. I think we are in agreement
> that your "-1" does not make any sense, and your documentation that
> said "positive" is the saner thing to do, no?
No. I think that my handling of maxsplit=0 was incorrect but that we
should continue using -1 as the special value.
I see maxsplit=0 as a legitimate degenerate case meaning "split into 1
string". Granted, it would only be useful in specialized situations
[1]. Moreover, "-1" makes a much more obvious special value than "0";
somebody reading code with maxsplit=-1 knows immediately that this is a
special value, whereas the handling of maxsplit=0 isn't quite so
blindingly obvious unless the reader knows the outcome of our current
discussion :-)
Therefore I still prefer treating only negative values of maxsplit to
mean "unlimited" and fixing maxsplit=0 as described. But if you insist
on the other convention, let me know and I will change it.
Michael
[1] A case I can think of would be parsing a format like
NUMPARENTS [PARENT...] SUMMARY
where "string_list_split(list, rest_of_line, ' ', numparents)" does the
right thing even if numparents==0.
--
Michael Haggerty
mhagger@alum•mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2012-09-10 11:49 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-09-09 5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty
2012-09-09 5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty
2012-09-09 9:35 ` Junio C Hamano
2012-09-10 4:45 ` Michael Haggerty
2012-09-10 5:47 ` Junio C Hamano
2012-09-10 11:48 ` Michael Haggerty [this message]
2012-09-10 16:09 ` Junio C Hamano
2012-09-09 5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty
2012-09-09 9:40 ` Junio C Hamano
2012-09-10 8:58 ` Michael Haggerty
2012-09-09 5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty
2012-09-09 9:45 ` Junio C Hamano
2012-09-10 9:15 ` Michael Haggerty
2012-09-09 5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty
2012-09-09 9:54 ` Junio C Hamano
2012-09-10 10:01 ` Michael Haggerty
2012-09-10 16:24 ` Junio C Hamano
2012-09-10 16:33 ` Jeff King
2012-09-10 17:48 ` Andreas Ericsson
2012-09-10 19:21 ` Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] Michael Haggerty
2012-09-10 21:56 ` Jeff King
2012-09-10 22:09 ` Michael Haggerty
2012-09-11 1:01 ` Andreas Ericsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=504DD3A5.8000201@alum.mit.edu \
--to=mhagger@alum$(echo .)mit.edu \
--cc=git@vger$(echo .)kernel.org \
--cc=gitster@pobox$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox