public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox•com>
To: Jeff King <peff@peff•net>
Cc: David Turner <novalis@novalis•org>,
	Duy Nguyen <pclouds@gmail•com>,
	Git Mailing List <git@vger•kernel.org>
Subject: Re: "disabling bitmap writing, as some objects are not being packed"?
Date: Wed, 08 Feb 2017 16:18:25 -0800	[thread overview]
Message-ID: <xmqqbmuctdwu.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <20170208230057.hking37uuynf4cgd@sigill.intra.peff.net> (Jeff King's message of "Wed, 8 Feb 2017 18:00:57 -0500")

Jeff King <peff@peff•net> writes:

> In my experience, auto-gc has never been a low-maintenance operation on
> the server side (and I do think it was primarily designed with clients
> in mind).

I do not think auto-gc was ever tweaked to help server usage, in its
history since it was invented strictly to help end-users (mostly new
ones).

> At GitHub we disable it entirely, and do our own gc based on a throttled
> job queue ...
> I wish regular Git were more turn-key in that respect. Maybe it is for
> smaller sites, but we certainly didn't find it so. And I don't know that
> it's feasible to really share the solution. It's entangled with our
> database (to store last-pushed and last-maintenance values for repos)
> and our job scheduler.

Thanks for sharing the insights from the trenches ;-)

> Yeah, I'm certainly open to improving Git's defaults. If it's not clear
> from the above, I mostly just gave up for a site the size of GitHub. :)
>
>> Idea 1: when gc --auto would issue this message, instead it could create
>> a file named gc.too-much-garbage (instead of gc.log), with this message.
>> If that file exists, and it is less than one day (?) old, then we don't
>> attempt to do a full gc; instead we just run git repack -A -d.  (If it's
>> more than one day old, we just delete it and continue anyway).
>
> I kind of wonder if this should apply to _any_ error. I.e., just check
> the mtime of gc.log and forcibly remove it when it's older than a day.
> You never want to get into a state that will fail to resolve itself
> eventually. That might still happen (e.g., corrupt repo), but at the
> very least it won't be because Git is too dumb to try again.

;-)

>> Idea 2 : Like idea 1, but instead of repacking, just smash the existing
>> packs together into one big pack.  In other words, don't consider
>> dangling objects, or recompute deltas.  Twitter has a tool called "git
>> combine-pack" that does this:
>> https://github.com/dturner-tw/git/blob/dturner/journal/builtin/combine-pack.c
>
> We wrote something similar at GitHub, too, but we never ended up using
> it in production. We found that with a sane scheduler, it's not too big
> a deal to just do maintenance once in a while.

Thanks again for this.  I've also been wondering about how effective
a "concatenate packs without paying reachability penalty" would be.

> I'm still not sure if it's worth making the fatal/non-fatal distinction.
> Doing so is perhaps safer, but it does mean that somebody has to decide
> which errors are important enough to block a retry totally, and which
> are not. In theory, it would be safe to always _try_ and then the gc
> process can decide when something is broken and abort. And all you've
> wasted is some processing power each day.

Yup, and somebody or something need to monitor so that repeated
failures can be dealt with.

  reply	other threads:[~2017-02-09  0:18 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-16 21:05 "disabling bitmap writing, as some objects are not being packed"? David Turner
2016-12-16 21:27 ` Jeff King
2016-12-16 21:28 ` Junio C Hamano
2016-12-16 21:32   ` Jeff King
2016-12-16 21:40     ` David Turner
2016-12-16 21:49       ` Jeff King
2016-12-16 23:59         ` [PATCH] pack-objects: don't warn about bitmaps on incremental pack David Turner
2016-12-17  4:04           ` Jeff King
2016-12-19 16:03             ` David Turner
2016-12-17  7:50   ` "disabling bitmap writing, as some objects are not being packed"? Duy Nguyen
2017-02-08  1:03     ` David Turner
2017-02-08  6:45       ` Duy Nguyen
2017-02-08  8:24         ` David Turner
2017-02-08  8:37           ` Duy Nguyen
2017-02-08 17:44             ` Junio C Hamano
2017-02-08 19:05               ` David Turner
2017-02-08 19:08                 ` Jeff King
2017-02-08 22:14                   ` David Turner
2017-02-08 23:00                     ` Jeff King
2017-02-09  0:18                       ` Junio C Hamano [this message]
2017-02-09  1:12                         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqbmuctdwu.fsf@gitster.mtv.corp.google.com \
    --to=gitster@pobox$(echo .)com \
    --cc=git@vger$(echo .)kernel.org \
    --cc=novalis@novalis$(echo .)org \
    --cc=pclouds@gmail$(echo .)com \
    --cc=peff@peff$(echo .)net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox