* Re attr API further revamp
@ 2016-06-13 20:55 Junio C Hamano
2016-06-13 21:31 ` Stefan Beller
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2016-06-13 20:55 UTC (permalink / raw)
To: git; +Cc: Stefan Beller
I hate to be doing this, but we need yet another revamp to the attr
API that affects all the callers.
In the original design, a codepath that wants to check attributes
repeatedly for many paths (e.g. "convert" that wants to see what
crlf, ident, filter, eol and text attributes are set to for paths
that it is asked to munge the blob contents for) used to allocate
static struct git_attr_check {
struct git_attr *attr;
const char *value;
} ccheck[NUM_ATTRS];
and populated the array when the first time the codepath was
entered, e.g.
if (!ccheck[0].attr) {
for (i = 0; i < NUM_ATTRS; i++)
ccheck[i].attr = git_attr(...);
}
and then make a call to the API to ask for these attributes in
ccheck[] for a path, i.e.
git_attr_check(path, NUM_ATTRS, ccheck);
The API function will fill in the ccheck[].value fields and the
caller will read from there how each attribute is set for the path.
This was updated with recent jc/attr topic so that attr_check
structure can be enriched to keep _more_ state, such as the
pre-parsed representation of all the contents of the relevant
.gitattributes files, which currently is held in a program-side
singleton instance called attr_stack, partly in preparation for
Stefan's sb/pathspec-label topic that is expected to place a lot
heavier load to the attribute subsystem.
A caller of the API after that update would do more like
static struct git_attr_check *ccheck;
if (!ccheck)
ccheck = git_attr_check_initl(...);
git_attr_check(path, ccheck);
for (i = 0; i < NUM_ATTRS; i++)
if (ccheck->check[i].value == ATTR_UNSET)
... do the Unset thing ...
This however is not thread-safe for obvious two reasons.
* Two threads can simultanously enter this section of the code and
end up initializing ccheck twice;
* Even though ccheck[].attr are constant for the purpose of this
codepath (i.e. all threads passing through are interested in
checking the same set of attributes), ccheck[].value fields
cannot be shared across simultaneous threads like the above
snippet does.
So it appears that the final API visible from the callers needs to
be updated further, perhaps something like:
static struct git_attr_check *ccheck;
const char *values[NUM_ATTRS];
git_attr_check_initl(&ccheck, "crlf", "ident", ..., NULL);
git_attr_check(path, ccheck, values);
where git_attr_check_initl() would do the "if ccheck is NULL then
initialize it to an instance" atomically in threading environment,
and git_attr_check() returns its finding to values[] array the
calling thread exclusively owns, while sharing the input ccheck
and the list of attributes the call inquires among threads.
Also unlike the earlier plan, attr_stack aka "where in the directory
hierarchy are we asking attributes for?" will not be stored directly
in git_attr_check structure. It is likely that a thread-local
structure will be introduced hidden behind this API (i.e. the
callers do not have to be aware of it), and attr_stack will be
registered in this thread-local structure keyed by &ccheck, so that
<thread, callsite> pair can have attr_stack instance of its own.
This is because a single attr_stack per ccheck will not work as an
optimization mechanism when multi-threaded. Two threads may be
running the same convert() function, and they may be interested in
the same set of attributes (e.g. "crlf", "ident", etc.) but they
will be working on different parts of the tree, so having attr_stack
per <thread, callsite> would make more sense.
All of which will be quite a lot of work, though, so do not expect
that it will appear by the end of this week ;-)
Thanks.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Re attr API further revamp
2016-06-13 20:55 Re attr API further revamp Junio C Hamano
@ 2016-06-13 21:31 ` Stefan Beller
2016-06-13 21:54 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Stefan Beller @ 2016-06-13 21:31 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git@vger•kernel.org
On Mon, Jun 13, 2016 at 1:55 PM, Junio C Hamano <gitster@pobox•com> wrote:
> I hate to be doing this, but we need yet another revamp to the attr
> API that affects all the callers.
So you don't mean origin/jc/attr-more by this?
(Given that we have jc/attr and jc/attr-more, the third thing could go
with jc/even-more-attr. Though I do not propose that)
>
> In the original design, a codepath that wants to check attributes
> repeatedly for many paths (e.g. "convert" that wants to see what
> crlf, ident, filter, eol and text attributes are set to for paths
> that it is asked to munge the blob contents for) used to allocate
>
> static struct git_attr_check {
> struct git_attr *attr;
> const char *value;
> } ccheck[NUM_ATTRS];
>
> and populated the array when the first time the codepath was
> entered, e.g.
>
> if (!ccheck[0].attr) {
> for (i = 0; i < NUM_ATTRS; i++)
> ccheck[i].attr = git_attr(...);
> }
>
> and then make a call to the API to ask for these attributes in
> ccheck[] for a path, i.e.
>
> git_attr_check(path, NUM_ATTRS, ccheck);
>
> The API function will fill in the ccheck[].value fields and the
> caller will read from there how each attribute is set for the path.
>
> This was updated with recent jc/attr topic so that attr_check
> structure can be enriched to keep _more_ state, such as the
> pre-parsed representation of all the contents of the relevant
> .gitattributes files, which currently is held in a program-side
> singleton instance called attr_stack, partly in preparation for
> Stefan's sb/pathspec-label topic that is expected to place a lot
> heavier load to the attribute subsystem.
I do not quite expect that. Some numbers:
(I am looking at https://android.googlesource.com/platform/manifest)
* There are currently different 56 groups in that manifest
* with an average of 1.5 groups for each entry that has at least one group,
* or 0.88 groups per entry (which is one resulting file/submodule after the
conversion)
For reference:
* git.git makes use of 1 attr (whitespace), in 2 value configurations
* that is applied to each file.
So comparing one entry to the 1.5 or 0.88 above doesn't seem to
be that bad. That is what needs to be done on a per-file basis?
What is different is the number of different attrs in use by about
2 orders of magnitude, which needs to be collected upfront and stored
in a smart way?
>
> A caller of the API after that update would do more like
>
> static struct git_attr_check *ccheck;
>
> if (!ccheck)
> ccheck = git_attr_check_initl(...);
> git_attr_check(path, ccheck);
> for (i = 0; i < NUM_ATTRS; i++)
> if (ccheck->check[i].value == ATTR_UNSET)
> ... do the Unset thing ...
This is origin/jc/attr-more ?
>
> This however is not thread-safe for obvious two reasons.
>
> * Two threads can simultanously enter this section of the code and
> end up initializing ccheck twice;
>
> * Even though ccheck[].attr are constant for the purpose of this
> codepath (i.e. all threads passing through are interested in
> checking the same set of attributes), ccheck[].value fields
> cannot be shared across simultaneous threads like the above
> snippet does.
>
> So it appears that the final API visible from the callers needs to
> be updated further, perhaps something like:
>
>
> static struct git_attr_check *ccheck;
> const char *values[NUM_ATTRS];
>
> git_attr_check_initl(&ccheck, "crlf", "ident", ..., NULL);
> git_attr_check(path, ccheck, values);
and later on each thread will do a
git_attr_thread_byebye(&ccheck) ?
to free its thread local stuff or rather is the calling site responsible for
freeing up all of it after all threads are done?
(This is probably too much of a detail question now?)
>
> where git_attr_check_initl() would do the "if ccheck is NULL then
> initialize it to an instance" atomically in threading environment,
> and git_attr_check() returns its finding to values[] array the
> calling thread exclusively owns, while sharing the input ccheck
> and the list of attributes the call inquires among threads.
>
> Also unlike the earlier plan, attr_stack aka "where in the directory
> hierarchy are we asking attributes for?" will not be stored directly
> in git_attr_check structure. It is likely that a thread-local
> structure will be introduced hidden behind this API (i.e. the
> callers do not have to be aware of it), and attr_stack will be
> registered in this thread-local structure keyed by &ccheck, so that
> <thread, callsite> pair can have attr_stack instance of its own.
>
> This is because a single attr_stack per ccheck will not work as an
> optimization mechanism when multi-threaded. Two threads may be
> running the same convert() function, and they may be interested in
> the same set of attributes (e.g. "crlf", "ident", etc.) but they
> will be working on different parts of the tree, so having attr_stack
> per <thread, callsite> would make more sense.
>
> All of which will be quite a lot of work, though, so do not expect
> that it will appear by the end of this week ;-)
>
> Thanks.
>
Thanks for the heads up,
Stefan
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Re attr API further revamp
2016-06-13 21:31 ` Stefan Beller
@ 2016-06-13 21:54 ` Junio C Hamano
0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2016-06-13 21:54 UTC (permalink / raw)
To: Stefan Beller; +Cc: git@vger•kernel.org
Stefan Beller <sbeller@google•com> writes:
> On Mon, Jun 13, 2016 at 1:55 PM, Junio C Hamano <gitster@pobox•com> wrote:
>> I hate to be doing this, but we need yet another revamp to the attr
>> API that affects all the callers.
>
> So you don't mean origin/jc/attr-more by this?
Not really; the tip of attr-more needs to be discarded after this
realization, but otherwise there is no API revamping in what is
queued on that topic.
> (Given that we have jc/attr and jc/attr-more, the third thing could go
> with jc/even-more-attr. Though I do not propose that)
The latter was merely because I couldn't rewind and rebuild the
former during the pre-release freeze. The fixup! in the latter will
be squashed into the former, etc...
>> static struct git_attr_check *ccheck;
>> const char *values[NUM_ATTRS];
>>
>> git_attr_check_initl(&ccheck, "crlf", "ident", ..., NULL);
>> git_attr_check(path, ccheck, values);
>
> and later on each thread will do a
This will have to do nothing; it is "static", and will be held to
the end of the lifetime of the program.
You _can_ allocate the check structure per thread and free them when
thread exits, but the usage pattern that appear in convert.c is
oblivious to the threading.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-06-13 21:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-13 20:55 Re attr API further revamp Junio C Hamano
2016-06-13 21:31 ` Stefan Beller
2016-06-13 21:54 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox