public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: David Turner <dturner@twopensource•com>
Cc: git@vger•kernel.org, mhagger@alum•mit.edu, neleai@seznam•cz,
	David Turner <dturner@twitter•com>
Subject: Re: [PATCH v8] refs.c: SSE2 optimizations for check_refname_component
Date: Mon, 16 Jun 2014 17:06:31 -0700	[thread overview]
Message-ID: <xmqqha3ko0vs.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <1402946566-14923-1-git-send-email-dturner@twitter.com> (David Turner's message of "Mon, 16 Jun 2014 15:22:46 -0400")

David Turner <dturner@twopensource•com> writes:

> Optimize check_refname_component using SSE2 on x86_64.
>
> git rev-parse HEAD is a good test-case for this, since it does almost
> nothing except parse refs.  For one particular repo with about 60k
> refs, almost all packed, the timings are:
>
> Look up table: 29 ms
> SSE2:          23 ms
>
> This cuts about 20% off of the runtime.
>
> The configure.ac changes include code from the GNU C Library written
> by Joseph S. Myers <joseph at codesourcery dot com>.
>
> Ondřej Bílka <neleai@seznam•cz> suggested an SSE2 approach to the

One e-mail address is obfuscated while the other not; intended?

> substring searches, which netted a speed boost over the SSE4.2 code I
> had initially written.
>
> Signed-off-by: David Turner <dturner@twitter•com>
> ---
> diff --git a/git-compat-util.h b/git-compat-util.h
> index f6d3a46..291d46b 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -668,6 +668,16 @@ void git_qsort(void *base, size_t nmemb, size_t size,
>  #endif
>  #endif
>  
> +#if defined(__GNUC__) && defined(__x86_64__)
> +#include <emmintrin.h>
> +/* This is the system memory page size; it's used so that we can read

Style (there are other instances of the same kind).

/*
 * This is the ...

> + * outside the bounds of an allocation without segfaulting.
> + */

> +static int check_refname_component_trailer(const char *cp, const char *refname, int flags)
> +{
> +	if (cp == refname)
> +		return 0; /* Component has zero length. */
> +	if (refname[0] == '.') {
> +		if (!(flags & REFNAME_DOT_COMPONENT))
> +			return -1; /* Component starts with '.'. */
> +		/*
> +		 * Even if leading dots are allowed, don't allow "."
> +		 * as a component (".." is prevented by a rule above).
> +		 */
> +		if (refname[1] == '\0')
> +			return -1; /* Component equals ".". */
> +	}
> +	if (cp - refname >= 5 && !memcmp(cp - 5, ".lock", 5))
> +		return -1; /* Refname ends with ".lock". */

This is merely a moved code that retained the same comment, but it
is more like "the current refname component ends with .lock", I
suspect.  In other words, we do not allow "refs/heads/foo.lock/bar".
Am I reading the patch correctly?

> +#if defined(__GNUC__) && defined(__x86_64__)
> +#define SSE_VECTOR_BYTES 16
> +
> +/* Vectorized version of check_refname_format. */
> +int check_refname_format(const char *refname, int flags)
> +{
> +	const char *cp = refname;
> +
> +	const __m128i dot = _mm_set1_epi8 ('.');

Style (there are other instances of the same kind).  No SP between
function/macro name and opening parenthesis.

> +	if (refname[0] == '.') {
> +		if (refname[1] == '/' || refname[1] == '\0')
> +			return -1;
> +		if (!(flags & REFNAME_DOT_COMPONENT))
> +			return -1;
> +	}
> +	while(1) {
> +		__m128i tmp, tmp1, result;
> +		uint64_t mask;
> +
> +		if ((uintptr_t) cp % PAGE_SIZE > PAGE_SIZE - SSE_VECTOR_BYTES  - 1)

OK, so we make sure we do not overrun by reading too much near the
end of the page, as the next page might be unmapped.

I am showing my ignorance but does cp (i.e. refname) upon entry to
this function need to be aligned in some way?

Thanks.

  reply	other threads:[~2014-06-17  0:06 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-16 19:22 [PATCH v8] refs.c: SSE2 optimizations for check_refname_component David Turner
2014-06-17  0:06 ` Junio C Hamano [this message]
2014-06-17  0:51   ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqha3ko0vs.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=dturner@twitter$(echo .)com \
    --cc=dturner@twopensource$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=mhagger@alum$(echo .)mit.edu \
    --cc=neleai@seznam$(echo .)cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox