public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
From: Michael J Gruber <git@drmicha•warpmail.net>
To: Paolo Bonzini <bonzini@gnu•org>
Cc: Sergio Callegari <sergio.callegari@gmail•com>,
	Git Mailing List <git@vger•kernel.org>
Subject: Re: Management of opendocument (openoffice.org) files in git
Date: Thu, 02 Oct 2008 14:52:17 +0200	[thread overview]
Message-ID: <48E4C401.90409@drmicha.warpmail.net> (raw)
In-Reply-To: <48CF6A7C.4020604@gnu.org>

Following up on the discussion about tracking oo files I conducted a
minimalistic test. I simulated tracking an oo spreadsheat, where from
one version to the next only a few cells would be entered in an existing
spreadsheet. These are the sizes of the individual files:

48K     0.ods
48K     1.ods
60K     2.ods
60K     3.ods
56K     4.ods
64K     5.ods
68K     6.ods
64K     7.ods
64K     8.ods
68K     9.ods
600K    total

I then tracked this in three different ways, each in a fresh repo:

"packed": copy $i.ods to t.ods as is, git add t.ods and commit.
"unpacked": use the unzipped contents of $i.ods instead.
"rezip": use the rezipped version (compression 0, using Sergio's script).
"oofilter": use clean/smudge filters (calling Sergio's rezip)

Here are the resulting sizes: first ".git/objects" as is, then after
repacking -adf, finally the total size of .git + the work tree (i.e. the
last revision).

packed
708K    .git/objects
492K    .git/objects
692K    .git + wt

unpacked
1,3M    .git/objects
144K    .git/objects
1,5M    .git + wt

rezip
992K    .git/objects
148K    .git/objects
1,4M    .git + wt

oofilter
984K    .git/objects
148K    .git/objects
352K    .git + wt

Unsurprisingly, the total size is dominated by the work tree size if you
 have few revisions. (Also, templates and such contribute.)
Note that git log --stat will report the sizes of packed files in the
first case, but the sizes of unpacked files in all other cases. In
particular, it reports a different size for the  HEAD revision than you
have in a HEAD checkout.

I tried rewriting "packed" after configuring the filters: filter-branch
refuses to work with a dirty work-tree, even after "checkout -f HEAD"
and "reset --hard". It seems that git status is permanently confused
here. (Has anyone successfully rewritten existing oo files?)

I'm not sure about the lessons, but I wanted to share the numbers
anyways. I think this (your script and its usage) is heading in a useful
direction and should maybe made more known, if not made easier from the
git side. Also I'm still looking for a good (deterministic) pdf
recompressor.

Michael

git version 1.6.0.2.426.g2cfa6

  reply	other threads:[~2008-10-02 12:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-16  6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
2008-09-16  7:05 ` Sergio Callegari
2008-09-16  8:12   ` Paolo Bonzini
2008-10-02 12:52     ` Michael J Gruber [this message]
2008-10-10  8:12       ` Peter Krefting
  -- strict thread matches above, loose matches on Subject: below --
2008-09-15 22:40 Sergio Callegari
2008-09-16  6:45 ` Matthieu Moy
2008-09-16  7:41   ` Sergio Callegari
2008-09-16  7:09 ` Johannes Sixt
2008-09-16  7:41   ` Sergio Callegari
2008-09-16  7:52     ` Johannes Sixt
2008-09-16 16:04     ` Avery Pennarun
2008-09-16 19:28       ` Stephen R. van den Berg
2008-09-16 21:13       ` Robin Rosenberg
2008-09-23 11:08 ` Peter Krefting

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48E4C401.90409@drmicha.warpmail.net \
    --to=git@drmicha$(echo .)warpmail.net \
    --cc=bonzini@gnu$(echo .)org \
    --cc=git@vger$(echo .)kernel.org \
    --cc=sergio.callegari@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox