public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
* git repository size vs. subversion repository size
@ 2008-04-04 22:02 Sean Brown
  2008-04-04 22:17 ` Björn Steinbrink
  2008-04-05  3:11 ` Shawn O. Pearce
  0 siblings, 2 replies; 11+ messages in thread
From: Sean Brown @ 2008-04-04 22:02 UTC (permalink / raw)
  To: git

Last night I decided to see what storage size differences I might see
between an svn repo and a git one.  So I imported a highly used
subversion repository into git and was shocked to see how huge the git
version was.  I used a repo that has a lot of branches and tagged
releases just to make sure importing into git would in fact keep all
of the history.  It did keep the history, but the total disk usage was
very different:

$subversionbox # du -hs ./my_sample_website/
67M	./my_sample_website

$localhost # du -hs ./git-samplesite/
3.6GB ./git-samplesite/

Here are the steps I took (locally):

mkdir git-samplesite-tmp
cd git-samplesite-tmp
git-svn init http://subversion.myco.com/my_sample_website --no-metadata
git config svn.authorsfile ~/Desktop/users.txt   # mapped svn users to git users
git-svn fetch
git clone git-samplesite-tmp git-samplesite

I did this based on reading the documents in the git wiki, so I
assumed they were "best practice."  Did I do something wrong?  If this
is a normal amount of storage need increase, we'd likely not move to
git based on the need for new hardware alone.

Any help would be appreciated.

Sean

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-04 22:02 git repository size vs. subversion repository size Sean Brown
@ 2008-04-04 22:17 ` Björn Steinbrink
  2008-04-04 23:49   ` Stephen Bannasch
  2008-04-05  2:27   ` Sean Brown
  2008-04-05  3:11 ` Shawn O. Pearce
  1 sibling, 2 replies; 11+ messages in thread
From: Björn Steinbrink @ 2008-04-04 22:17 UTC (permalink / raw)
  To: Sean Brown; +Cc: git

On 2008.04.04 18:02:56 -0400, Sean Brown wrote:
> Last night I decided to see what storage size differences I might see
> between an svn repo and a git one.  So I imported a highly used
> subversion repository into git and was shocked to see how huge the git
> version was.  I used a repo that has a lot of branches and tagged
> releases just to make sure importing into git would in fact keep all
> of the history.  It did keep the history, but the total disk usage was
> very different:
> 
> $subversionbox # du -hs ./my_sample_website/
> 67M	./my_sample_website
> 
> $localhost # du -hs ./git-samplesite/
> 3.6GB ./git-samplesite/

How much of that is in the .git/svn directory? The contents of that
directory are used to map git commits to svn revision and git versions
before 1.5.4 had a quite space consuming file format for that. The new
format is a lot better. If you want to switch completely, you can even
just delete the .git/svn directory, as that's only required as long as
you want to interact with the corresponding svn repository.

And finally, you might want to repack to repository once after the
initial import, to get a smaller repo. Something like:
git repack -a -d -f --window=100 --depth=100

Björn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-04 22:17 ` Björn Steinbrink
@ 2008-04-04 23:49   ` Stephen Bannasch
  2008-04-05  0:01     ` Steven Walter
  2008-04-05  2:27   ` Sean Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Stephen Bannasch @ 2008-04-04 23:49 UTC (permalink / raw)
  To: git

I'm just fooling around with git so far but I found a huge space 
savings after running git gc. Here are the rough numbers:

svn repo on server:        1GB
svn repo checked out:      2GB
git svn clone after gc:  384MB

That's saving the full history in git -- about 13000 revisions.

Using git version 1.5.4.4.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-04 23:49   ` Stephen Bannasch
@ 2008-04-05  0:01     ` Steven Walter
  2008-04-05  0:04       ` Stephen Bannasch
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Steven Walter @ 2008-04-05  0:01 UTC (permalink / raw)
  To: Stephen Bannasch; +Cc: git

On Fri, Apr 04, 2008 at 07:49:24PM -0400, Stephen Bannasch wrote:
> I'm just fooling around with git so far but I found a huge space savings 
> after running git gc. Here are the rough numbers:
>
> svn repo on server:        1GB
> svn repo checked out:      2GB
> git svn clone after gc:  384MB
>
> That's saving the full history in git -- about 13000 revisions.

git-gc is such an important step in importing a repository from svn.
Why doesn't git-svn take this vital step automatically?
-- 
-Steven Walter <stevenrwalter@gmail•com>
Freedom is the freedom to say that 2 + 2 = 4
B2F1 0ECC E605 7321 E818  7A65 FC81 9777 DC28 9E8F 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-05  0:01     ` Steven Walter
@ 2008-04-05  0:04       ` Stephen Bannasch
  2008-04-05  0:18       ` Björn Steinbrink
  2008-04-14 15:28       ` Eric Hanchrow
  2 siblings, 0 replies; 11+ messages in thread
From: Stephen Bannasch @ 2008-04-05  0:04 UTC (permalink / raw)
  To: git

>On Fri, Apr 04, 2008 at 07:49:24PM -0400, Stephen Bannasch wrote:
>> I'm just fooling around with git so far but I found a huge space savings
>> after running git gc. Here are the rough numbers:
>>
>> svn repo on server:        1GB
>> svn repo checked out:      2GB
>> git svn clone after gc:  384MB
>>
>> That's saving the full history in git -- about 13000 revisions.
>
>git-gc is such an important step in importing a repository from svn.
>Why doesn't git-svn take this vital step automatically?

I think because it is not necessary for continued productive use of git and the gc operation is expensive. On the repo above it took about 8 hours running in the background while I was working on other stuff.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-05  0:01     ` Steven Walter
  2008-04-05  0:04       ` Stephen Bannasch
@ 2008-04-05  0:18       ` Björn Steinbrink
  2008-04-14 15:28       ` Eric Hanchrow
  2 siblings, 0 replies; 11+ messages in thread
From: Björn Steinbrink @ 2008-04-05  0:18 UTC (permalink / raw)
  To: Steven Walter; +Cc: Stephen Bannasch, git

[Stephan, please stop dropping me from Cc:, thanks]

On 2008.04.04 20:01:41 -0400, Steven Walter wrote:
> On Fri, Apr 04, 2008 at 07:49:24PM -0400, Stephen Bannasch wrote:
> > I'm just fooling around with git so far but I found a huge space savings 
> > after running git gc. Here are the rough numbers:
> >
> > svn repo on server:        1GB
> > svn repo checked out:      2GB
> > git svn clone after gc:  384MB
> >
> > That's saving the full history in git -- about 13000 revisions.
> 
> git-gc is such an important step in importing a repository from svn.
> Why doesn't git-svn take this vital step automatically?

Starting from 1.5.4 (IIRC) git-svn will repack every 1000 revisions (by
default). That won't give you a reeeeally tiny pack but OTOH it won't
take ages to do the repacks either.

Björn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-04 22:17 ` Björn Steinbrink
  2008-04-04 23:49   ` Stephen Bannasch
@ 2008-04-05  2:27   ` Sean Brown
  2008-04-05  2:34     ` Björn Steinbrink
  1 sibling, 1 reply; 11+ messages in thread
From: Sean Brown @ 2008-04-05  2:27 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: git

On Fri, Apr 4, 2008 at 6:17 PM, Björn Steinbrink <B.Steinbrink@gmx•de> wrote:
> On 2008.04.04 18:02:56 -0400, Sean Brown wrote:
>  > Last night I decided to see what storage size differences I might see
>  > between an svn repo and a git one.  So I imported a highly used
>  > subversion repository into git and was shocked to see how huge the git
>  > version was.  I used a repo that has a lot of branches and tagged
>  > releases just to make sure importing into git would in fact keep all
>  > of the history.  It did keep the history, but the total disk usage was
>  > very different:
>  >
>  > $subversionbox # du -hs ./my_sample_website/
>  > 67M   ./my_sample_website
>  >
>  > $localhost # du -hs ./git-samplesite/
>  > 3.6GB ./git-samplesite/
>
>  How much of that is in the .git/svn directory? The contents of that
>  directory are used to map git commits to svn revision and git versions
>  before 1.5.4 had a quite space consuming file format for that. The new
>  format is a lot better. If you want to switch completely, you can even
>  just delete the .git/svn directory, as that's only required as long as
>  you want to interact with the corresponding svn repository.
>
>  And finally, you might want to repack to repository once after the
>  initial import, to get a smaller repo. Something like:
>  git repack -a -d -f --window=100 --depth=100
>

The svn folder (in the.git directory) was only about 4.2 MB.  After
running the repack (and then after that git gc as mentioned by another
in this thread), it's still about 3.5 GB.

git-samplesite (master)]$ du -hs ./*
2.1G	./branches
1.4G	./tags
 66M	./trunk

The site does have a lot of binary files (PDFs, photographs an such).
I suppose we could leave all of the branches and tags in subversion
and just move the trunk to git, but I was hoping to make a clean break
from subversion.

If anyone has any further suggestions I'd love to hear them.

Sean

-- 

Sean Brown
seanmichaelbrown@gmail•com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-05  2:27   ` Sean Brown
@ 2008-04-05  2:34     ` Björn Steinbrink
  2008-04-13  9:57       ` Jan Hudec
  0 siblings, 1 reply; 11+ messages in thread
From: Björn Steinbrink @ 2008-04-05  2:34 UTC (permalink / raw)
  To: Sean Brown; +Cc: git

On 2008.04.04 22:27:12 -0400, Sean Brown wrote:
> On Fri, Apr 4, 2008 at 6:17 PM, Björn Steinbrink <B.Steinbrink@gmx•de> wrote:
> > On 2008.04.04 18:02:56 -0400, Sean Brown wrote:
> >  > Last night I decided to see what storage size differences I might see
> >  > between an svn repo and a git one.  So I imported a highly used
> >  > subversion repository into git and was shocked to see how huge the git
> >  > version was.  I used a repo that has a lot of branches and tagged
> >  > releases just to make sure importing into git would in fact keep all
> >  > of the history.  It did keep the history, but the total disk usage was
> >  > very different:
> >  >
> >  > $subversionbox # du -hs ./my_sample_website/
> >  > 67M   ./my_sample_website
> >  >
> >  > $localhost # du -hs ./git-samplesite/
> >  > 3.6GB ./git-samplesite/
> >
> >  How much of that is in the .git/svn directory? The contents of that
> >  directory are used to map git commits to svn revision and git versions
> >  before 1.5.4 had a quite space consuming file format for that. The new
> >  format is a lot better. If you want to switch completely, you can even
> >  just delete the .git/svn directory, as that's only required as long as
> >  you want to interact with the corresponding svn repository.
> >
> >  And finally, you might want to repack to repository once after the
> >  initial import, to get a smaller repo. Something like:
> >  git repack -a -d -f --window=100 --depth=100
> >
> 
> The svn folder (in the.git directory) was only about 4.2 MB.  After
> running the repack (and then after that git gc as mentioned by another
> in this thread), it's still about 3.5 GB.
> 
> git-samplesite (master)]$ du -hs ./*
> 2.1G	./branches
> 1.4G	./tags
>  66M	./trunk

Uhm, you forgot to use -s when doing the clone. That would have created
real git branches instead of the directories... What you are counting is
the size of the checked out, uncompressed files of _all_ branches and
_all_ tags (and trunk). The repo size of basically what "du -sh .git"
would give.

Björn

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-04 22:02 git repository size vs. subversion repository size Sean Brown
  2008-04-04 22:17 ` Björn Steinbrink
@ 2008-04-05  3:11 ` Shawn O. Pearce
  1 sibling, 0 replies; 11+ messages in thread
From: Shawn O. Pearce @ 2008-04-05  3:11 UTC (permalink / raw)
  To: Sean Brown; +Cc: git

Sean Brown <seanmichaelbrown@gmail•com> wrote:
> 
> Here are the steps I took (locally):
> 
> mkdir git-samplesite-tmp
> cd git-samplesite-tmp
> git-svn init http://subversion.myco.com/my_sample_website --no-metadata
> git config svn.authorsfile ~/Desktop/users.txt   # mapped svn users to git users
> git-svn fetch
> git clone git-samplesite-tmp git-samplesite
> 
> I did this based on reading the documents in the git wiki, so I
> assumed they were "best practice."  Did I do something wrong?

The last command there didn't get you the most efficiently packed
repository possible.  More recent versions of git-clone will prefer
to hardlink all of the loose objects and packs from the source to
the destination, so the clone can occur more quickly when they are
on the same filesystem.

Really what you want to do here is repack the cloned directory
(cd git-samplesite && git repack -a -d -f) and maybe include
some aggressive --depth and --window options (e.g. 100/100)
if you have some CPU time to burn and are reasonably certain
you will be keeping the result.  You only have to spend that
CPU time once when converting from SVN, and all future clones
from this one will benefit.

But your really major disk usage was due to what someone else
pointed out, which was missing the "-s" flag to git-svn.  So the
Git working directory was huge, as we created working files for
every single branch and every single tag.  Ouch.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-05  2:34     ` Björn Steinbrink
@ 2008-04-13  9:57       ` Jan Hudec
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Hudec @ 2008-04-13  9:57 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Sean Brown, git

On Sat, Apr 05, 2008 at 04:34:37 +0200, Björn Steinbrink wrote:
> On 2008.04.04 22:27:12 -0400, Sean Brown wrote:
> > On Fri, Apr 4, 2008 at 6:17 PM, Björn Steinbrink <B.Steinbrink@gmx•de> wrote:
> > > On 2008.04.04 18:02:56 -0400, Sean Brown wrote:
> > >  > Last night I decided to see what storage size differences I might see
> > >  > between an svn repo and a git one.  So I imported a highly used
> > >  > subversion repository into git and was shocked to see how huge the git
> > >  > version was.  I used a repo that has a lot of branches and tagged
> > >  > releases just to make sure importing into git would in fact keep all
> > >  > of the history.  It did keep the history, but the total disk usage was
> > >  > very different:
> > >  >
> > >  > $subversionbox # du -hs ./my_sample_website/
> > >  > 67M   ./my_sample_website
> > >  >
> > >  > $localhost # du -hs ./git-samplesite/
> > >  > 3.6GB ./git-samplesite/
> > >
> > >  How much of that is in the .git/svn directory? The contents of that
> > >  directory are used to map git commits to svn revision and git versions
> > >  before 1.5.4 had a quite space consuming file format for that. The new
> > >  format is a lot better. If you want to switch completely, you can even
> > >  just delete the .git/svn directory, as that's only required as long as
> > >  you want to interact with the corresponding svn repository.
> > >
> > >  And finally, you might want to repack to repository once after the
> > >  initial import, to get a smaller repo. Something like:
> > >  git repack -a -d -f --window=100 --depth=100
> > >
> > 
> > The svn folder (in the.git directory) was only about 4.2 MB.  After
> > running the repack (and then after that git gc as mentioned by another
> > in this thread), it's still about 3.5 GB.
> > 
> > git-samplesite (master)]$ du -hs ./*
> > 2.1G	./branches
> > 1.4G	./tags
> >  66M	./trunk
> 
> Uhm, you forgot to use -s when doing the clone. That would have created

No, not the clone, but the git svn init.

> real git branches instead of the directories... What you are counting is
> the size of the checked out, uncompressed files of _all_ branches and
> _all_ tags (and trunk). The repo size of basically what "du -sh .git"
> would give.
> 
> Björn
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger•kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
						 Jan 'Bulb' Hudec <bulb@ucw•cz>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: git repository size vs. subversion repository size
  2008-04-05  0:01     ` Steven Walter
  2008-04-05  0:04       ` Stephen Bannasch
  2008-04-05  0:18       ` Björn Steinbrink
@ 2008-04-14 15:28       ` Eric Hanchrow
  2 siblings, 0 replies; 11+ messages in thread
From: Eric Hanchrow @ 2008-04-14 15:28 UTC (permalink / raw)
  To: git

>>>>> "Steven" == Steven Walter <stevenrwalter@gmail•com> writes:

    Steven> git-gc is such an important step in importing a repository
    Steven> from svn.  Why doesn't git-svn take this vital step
    Steven> automatically?

Mine did:
    git-svn version 1.5.5 (svn 1.3.2)
    git-svn init file://$HOME/svn-repos
    git-svn fetch
...

            M	trunk/home/local/bin/spam/print-subjects.ss
    r5480 = b3edab03f5bacda1db025bd2cca769abbe007f23 (git-svn)
    Auto packing your repository for optimum performance. You may also
    run "git gc" manually. See "git help gc" for more information.
    Counting objects: 11182, done.
    Compressing objects: 100% (11021/11021), done.
    Writing objects: 100% (11182/11182), done.
    Total 11182 (delta 9378), reused 0 (delta 0)
    Checked out HEAD:
      file:///home/erich/svn-repos r5480

    $ du -sh /tmp/ya/.git/ ~/svn-repos/
    23M	/tmp/ya/.git/
    88M	/home/erich/svn-repos/
-- 
The old graybeards in the Smalltalk world may not seem
relevant, but if you ask them a question about ORM, they have been
thinking about it for 20 years.
        -- Avi Bryant

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-04-14 15:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-04 22:02 git repository size vs. subversion repository size Sean Brown
2008-04-04 22:17 ` Björn Steinbrink
2008-04-04 23:49   ` Stephen Bannasch
2008-04-05  0:01     ` Steven Walter
2008-04-05  0:04       ` Stephen Bannasch
2008-04-05  0:18       ` Björn Steinbrink
2008-04-14 15:28       ` Eric Hanchrow
2008-04-05  2:27   ` Sean Brown
2008-04-05  2:34     ` Björn Steinbrink
2008-04-13  9:57       ` Jan Hudec
2008-04-05  3:11 ` Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox