public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum•mit.edu>
To: Junio C Hamano <gitster@pobox•com>
Cc: git@vger•kernel.org
Subject: Re: [PATCH 1/4] Add a new function, string_list_split_in_place()
Date: Mon, 10 Sep 2012 13:48:53 +0200	[thread overview]
Message-ID: <504DD3A5.8000201@alum.mit.edu> (raw)
In-Reply-To: <7vhar6pgxs.fsf@alter.siamese.dyndns.org>

On 09/10/2012 07:47 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum•mit.edu> writes:
> 
>> ...  Consider something like
>>
>>     struct string_list *split_file_into_words(FILE *f)
>>     {
>>         char buf[1024];
>>         struct string_list *list = new string list;
>>         list->strdup_strings = 1;
>>         while (not EOF) {
>>             read_line_into_buf();
>>             string_list_split_in_place(list, buf, ' ', -1);
>>         }
>>         return list;
>>     }
> 
> That is a prime example to argue that string_list_split() would make
> more sense, no?  The caller does _not_ mind if the function mucks
> with buf, but the resulting list is not allowed to point into buf.
> 
> In such a case, the caller shouldn't have to _care_ if it wants to
> allow buf to be mucked with; it is already asking that the resulting
> list _not_ point into buf by setting strdup_strings (by the way,
> that is part of the function input, so think of it like various *opt
> variables passed into functions to tweak their behaviour).  If the
> implementation can do so without sacrificing performance (and in
> this case, as you said, it can), it should take "const char *buf".

You're right, I was thinking that a caller of
string_list_split_in_place() could choose to remain ignorant of whether
strdup_strings is set, but this is incorrect because it needs to know
whether to do its own memory management of the strings that it passes
into string_list_append().

> So it appears to me that sl_split_in_place(), if implemented, should
> be kept as a special case for performance-minded callers that have
> full control of the lifetime rules of the variables they use, can
> set strdup_strings to false, and can let buf modified in place, and
> can accept list that point into buf.

OK, so the bottom line would be to have two versions of the function.
One takes a (const char *) and *requires* strdup_strings to be set on
its input list:

int string_list_split(struct string_list *list, const char *string,
		      int delim, int maxsplit)
{
	assert(list->strdup_strings);
	...
}

The other takes a (char *) and modifies it in-place, and maybe even
requires strdup_strings to be false on its input list:

int string_list_split_in_place(struct string_list *list, char *string,
			       int delim, int maxsplit)
{
	/* not an error per se but a strong suggestion of one: */
	assert(!list->strdup_strings);
	...
}

(The latter (modulo assert) is the one that I have implemented, but it
might not be needed immediately.)  Do you agree?

>>>> + * Examples:
>>>> + *   string_list_split_in_place(l, "foo:bar:baz", ':', -1) -> ["foo", "bar", "baz"]
>>>> + *   string_list_split_in_place(l, "foo:bar:baz", ':', 1) -> ["foo", "bar:baz"]
>>>> + *   string_list_split_in_place(l, "foo:bar:", ':', -1) -> ["foo", "bar", ""]
>>>
>>> I would find it more natural to see a sentinel value against
>>> "positive" to be 0, not -1.  "-1" gives an impression as if "-2"
>>> might do something different from "-1", but Zero is a lot more
>>> special.
>>
>> You have raised a good point and I think there is a flaw in the API, but
>> I'm not sure I agree with you what the flaw is...
>>
>> The "maxsplit" argument limits the number of times the string should be
>> split.  I.e., if maxsplit is set, then the output will have at most
>> (maxsplit + 1) strings.
> 
> So "do not split, just give me the whole thing" would be maxsplit == 0
> to split into (maxsplit+1) == 1 string.  I think we are in agreement
> that your "-1" does not make any sense, and your documentation that
> said "positive" is the saner thing to do, no?

No.  I think that my handling of maxsplit=0 was incorrect but that we
should continue using -1 as the special value.

I see maxsplit=0 as a legitimate degenerate case meaning "split into 1
string".  Granted, it would only be useful in specialized situations
[1].  Moreover, "-1" makes a much more obvious special value than "0";
somebody reading code with maxsplit=-1 knows immediately that this is a
special value, whereas the handling of maxsplit=0 isn't quite so
blindingly obvious unless the reader knows the outcome of our current
discussion :-)

Therefore I still prefer treating only negative values of maxsplit to
mean "unlimited" and fixing maxsplit=0 as described.  But if you insist
on the other convention, let me know and I will change it.

Michael

[1] A case I can think of would be parsing a format like

    NUMPARENTS [PARENT...] SUMMARY

where "string_list_split(list, rest_of_line, ' ', numparents)" does the
right thing even if numparents==0.

-- 
Michael Haggerty
mhagger@alum•mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2012-09-10 11:49 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-09  5:53 [PATCH 0/4] Add some string_list-related functions Michael Haggerty
2012-09-09  5:53 ` [PATCH 1/4] Add a new function, string_list_split_in_place() Michael Haggerty
2012-09-09  9:35   ` Junio C Hamano
2012-09-10  4:45     ` Michael Haggerty
2012-09-10  5:47       ` Junio C Hamano
2012-09-10 11:48         ` Michael Haggerty [this message]
2012-09-10 16:09           ` Junio C Hamano
2012-09-09  5:53 ` [PATCH 2/4] Add a new function, filter_string_list() Michael Haggerty
2012-09-09  9:40   ` Junio C Hamano
2012-09-10  8:58     ` Michael Haggerty
2012-09-09  5:53 ` [PATCH 3/4] Add a new function, string_list_remove_duplicates() Michael Haggerty
2012-09-09  9:45   ` Junio C Hamano
2012-09-10  9:15     ` Michael Haggerty
2012-09-09  5:53 ` [PATCH 4/4] Add a function string_list_longest_prefix() Michael Haggerty
2012-09-09  9:54   ` Junio C Hamano
2012-09-10 10:01     ` Michael Haggerty
2012-09-10 16:24       ` Junio C Hamano
2012-09-10 16:33         ` Jeff King
2012-09-10 17:48           ` Andreas Ericsson
2012-09-10 19:21             ` Using doxygen (or something similar) to generate API docs [was [PATCH 4/4] Add a function string_list_longest_prefix()] Michael Haggerty
2012-09-10 21:56               ` Jeff King
2012-09-10 22:09                 ` Michael Haggerty
2012-09-11  1:01                 ` Andreas Ericsson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=504DD3A5.8000201@alum.mit.edu \
    --to=mhagger@alum$(echo .)mit.edu \
    --cc=git@vger$(echo .)kernel.org \
    --cc=gitster@pobox$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox