From: Michael J Gruber <git@drmicha•warpmail.net>
To: Paolo Bonzini <bonzini@gnu•org>
Cc: Sergio Callegari <sergio.callegari@gmail•com>,
Git Mailing List <git@vger•kernel.org>
Subject: Re: Management of opendocument (openoffice.org) files in git
Date: Thu, 02 Oct 2008 14:52:17 +0200 [thread overview]
Message-ID: <48E4C401.90409@drmicha.warpmail.net> (raw)
In-Reply-To: <48CF6A7C.4020604@gnu.org>
Following up on the discussion about tracking oo files I conducted a
minimalistic test. I simulated tracking an oo spreadsheat, where from
one version to the next only a few cells would be entered in an existing
spreadsheet. These are the sizes of the individual files:
48K 0.ods
48K 1.ods
60K 2.ods
60K 3.ods
56K 4.ods
64K 5.ods
68K 6.ods
64K 7.ods
64K 8.ods
68K 9.ods
600K total
I then tracked this in three different ways, each in a fresh repo:
"packed": copy $i.ods to t.ods as is, git add t.ods and commit.
"unpacked": use the unzipped contents of $i.ods instead.
"rezip": use the rezipped version (compression 0, using Sergio's script).
"oofilter": use clean/smudge filters (calling Sergio's rezip)
Here are the resulting sizes: first ".git/objects" as is, then after
repacking -adf, finally the total size of .git + the work tree (i.e. the
last revision).
packed
708K .git/objects
492K .git/objects
692K .git + wt
unpacked
1,3M .git/objects
144K .git/objects
1,5M .git + wt
rezip
992K .git/objects
148K .git/objects
1,4M .git + wt
oofilter
984K .git/objects
148K .git/objects
352K .git + wt
Unsurprisingly, the total size is dominated by the work tree size if you
have few revisions. (Also, templates and such contribute.)
Note that git log --stat will report the sizes of packed files in the
first case, but the sizes of unpacked files in all other cases. In
particular, it reports a different size for the HEAD revision than you
have in a HEAD checkout.
I tried rewriting "packed" after configuring the filters: filter-branch
refuses to work with a dirty work-tree, even after "checkout -f HEAD"
and "reset --hard". It seems that git status is permanently confused
here. (Has anyone successfully rewritten existing oo files?)
I'm not sure about the lessons, but I wanted to share the numbers
anyways. I think this (your script and its usage) is heading in a useful
direction and should maybe made more known, if not made easier from the
git side. Also I'm still looking for a good (deterministic) pdf
recompressor.
Michael
git version 1.6.0.2.426.g2cfa6
next prev parent reply other threads:[~2008-10-02 12:53 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-16 6:24 Management of opendocument (openoffice.org) files in git Paolo Bonzini
2008-09-16 7:05 ` Sergio Callegari
2008-09-16 8:12 ` Paolo Bonzini
2008-10-02 12:52 ` Michael J Gruber [this message]
2008-10-10 8:12 ` Peter Krefting
-- strict thread matches above, loose matches on Subject: below --
2008-09-15 22:40 Sergio Callegari
2008-09-16 6:45 ` Matthieu Moy
2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:09 ` Johannes Sixt
2008-09-16 7:41 ` Sergio Callegari
2008-09-16 7:52 ` Johannes Sixt
2008-09-16 16:04 ` Avery Pennarun
2008-09-16 19:28 ` Stephen R. van den Berg
2008-09-16 21:13 ` Robin Rosenberg
2008-09-23 11:08 ` Peter Krefting
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48E4C401.90409@drmicha.warpmail.net \
--to=git@drmicha$(echo .)warpmail.net \
--cc=bonzini@gnu$(echo .)org \
--cc=git@vger$(echo .)kernel.org \
--cc=sergio.callegari@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox