public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Stefan Beller <sbeller@google•com>
Cc: Jeff King <peff@peff•net>,
	"git\@vger.kernel.org" <git@vger•kernel.org>,
	Jonathan Nieder <jrnieder@gmail•com>,
	Johannes Sixt <j6t@kdbg•org>, Heiko Voigt <hvoigt@hvoigt•net>,
	Jens Lehmann <jens.lehmann@web•de>
Subject: Re: [RFC PATCH 2/3] run-commands: add an async queue processor
Date: Mon, 24 Aug 2015 14:22:33 -0700	[thread overview]
Message-ID: <xmqqvbc42xgm.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <CAGZ79kbhRQ0SmSu6xiij6Gxdrtfh02knexz6+CNjihY4xbC83w@mail.gmail.com> (Stefan Beller's message of "Fri, 21 Aug 2015 16:40:09 -0700")

Stefan Beller <sbeller@google•com> writes:

> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 3f10840..159ee36 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -11,6 +11,7 @@
>  #include "exec_cmd.h"
>  #include "streaming.h"
>  #include "thread-utils.h"
> +#include "run-command.h"
>
>  static const char index_pack_usage[] =
>  "git index-pack [-v] [-o <index-file>] [--keep | --keep=<msg>]
> [--verify] [--strict] (<pack-file> | --stdin [--fix-thin]
> [<pack-file>])";
> @@ -1075,7 +1076,7 @@ static void resolve_base(struct object_entry *obj)
>  }
>
>  #ifndef NO_PTHREADS
> -static void *threaded_second_pass(void *data)
> +static int threaded_second_pass(struct task_queue *tq, void *data)
>  {
>         set_thread_data(data);
>         for (;;) {
> @@ -1096,7 +1097,7 @@ static void *threaded_second_pass(void *data)
>
>                 resolve_base(&objects[i]);
>         }
> -       return NULL;
> +       return 0;
>  }
>  #endif
>
> @@ -1195,18 +1196,18 @@ static void resolve_deltas(void)
>                                           nr_ref_deltas + nr_ofs_deltas);
>
>  #ifndef NO_PTHREADS
> -       nr_dispatched = 0;
>         if (nr_threads > 1 || getenv("GIT_FORCE_THREADS")) {
> +               nr_dispatched = 0;
>                 init_thread();
> -               for (i = 0; i < nr_threads; i++) {
> -                       int ret = pthread_create(&thread_data[i].thread, NULL,
> -                                                threaded_second_pass,
> thread_data + i);
> -                       if (ret)
> -                               die(_("unable to create thread: %s"),
> -                                   strerror(ret));
> -               }
> +
> +               tq = create_task_queue(nr_threads);
> +
>                 for (i = 0; i < nr_threads; i++)
> -                       pthread_join(thread_data[i].thread, NULL);
> +                       add_task(tq, threaded_second_pass, thread_data + i);
> +
> +               if (finish_task_queue(tq))
> +                       die("Not all threads have finished");
> +
>                 cleanup_thread();
>                 return;
>         }

This looks quite straight-forward, but that is not too surprising,
as the "dispatcher" side naturally should have a similar logic to
manage threads by creating and joining them ;-)

> @@ -1075,28 +1067,24 @@ static void resolve_base(struct object_entry *obj)
>  }
>
>  #ifndef NO_PTHREADS
> -static void *threaded_second_pass(void *data)
> +static int threaded_second_pass(struct task_queue *tq, void *data)
>  {
> - set_thread_data(data);
> - for (;;) {
> - int i;
> - counter_lock();
> - display_progress(progress, nr_resolved_deltas);
> - counter_unlock();
> - work_lock();
> - while (nr_dispatched < nr_objects &&
> -       is_delta_type(objects[nr_dispatched].type))
> - nr_dispatched++;
> - if (nr_dispatched >= nr_objects) {
> - work_unlock();
> - break;
> - }
> - i = nr_dispatched++;
> - work_unlock();
> + if (!get_thread_data()) {
> + struct thread_local *t = xmalloc(sizeof(*t));
> + t->pack_fd = open(curr_pack, O_RDONLY);
> + if (t->pack_fd == -1)
> + die_errno(_("unable to open %s"), curr_pack);
>
> - resolve_base(&objects[i]);
> + set_thread_data(t);
> + /* TODO: I haven't figured out where to free this memory */

Sorry but it is hard to grok what is going on in the code with
squished indentation.

> Why did I not pick upload-pack?
> ========================
>
> I could not spot easily how to make it a typical queuing problem.
> We start in the threads, and once in a while we're like: "Uhg, this
> thread has more load than the other, let's shove a bit over there"
>
> So what we would need there is splitting the work in smallest chunks
> from the beginning and just load it into the queue via add_task

... or a way for the overload and tasks to communicate with each
other and rebalance.  If I am not mistaken, it has a big negative
consequence for pack-objects to split the work to too small a chunk,
as the chunk boundary also becomes boundary of find delta bases.

  reply	other threads:[~2015-08-24 21:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-21  1:40 [PATCH 1/3] submodule: implement `module_clone` as a builtin helper Stefan Beller
2015-08-21  1:40 ` [RFC PATCH 2/3] run-commands: add an async queue processor Stefan Beller
2015-08-21 19:05   ` Junio C Hamano
2015-08-21 19:44     ` Jeff King
2015-08-21 19:48       ` Stefan Beller
2015-08-21 19:51         ` Jeff King
2015-08-21 20:12           ` Stefan Beller
2015-08-21 20:41       ` Junio C Hamano
2015-08-21 23:40       ` Stefan Beller
2015-08-24 21:22         ` Junio C Hamano [this message]
2015-08-21 19:45     ` Stefan Beller
2015-08-21 20:47       ` Junio C Hamano
2015-08-21 20:56         ` Stefan Beller
2015-08-21  1:40 ` [WIP/PATCH 3/3] submodule: helper to run foreach in parallel Stefan Beller
2015-08-21 19:23   ` Junio C Hamano
2015-08-21 20:21     ` Stefan Beller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqvbc42xgm.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=hvoigt@hvoigt$(echo .)net \
    --cc=j6t@kdbg$(echo .)org \
    --cc=jens.lehmann@web$(echo .)de \
    --cc=jrnieder@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    --cc=sbeller@google$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox