public inbox for git@vger.kernel.org 
 help / color / mirror / Atom feed
* git svn's performance issue and strange pauses, and other thing
@ 2014-09-18  7:39 Hin-Tak Leung
  2014-09-19  8:25 ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Hin-Tak Leung @ 2014-09-18  7:39 UTC (permalink / raw)
  To: normalperson, git

(I am not on the list - please CC)

Thanks for git-svn - I use it instead of subversion itself for many years now.

Just thought I'd ask/report a few issues I noticed for some time
now, of tracking development of a particular subversion-based
development project. Broadly speaking, I think there are 3 problems,
especially noticeable against a particular repository, but 
to a lesser extent with some others too.

- just doing "git svn fetch --all" seems to consume a lot of memory,
for very little actual fetched changes. (in the 2GB+ region, sometimes).

- "git svn fetch --all" also seems to take a long time too, for certain
fetched changes. (in the minutes region).

-  I know I can probably just "read the source", but I'd like to know
why .git/svn/.caches is even larger than .git/objects (which supposedly
contains everything that's of interest)? I hope this can be documented
towards the end of the man-page, for example, of important parts
of .git/svn (and what not to do with them...), without needing to
'read the source'. Here is part of "du" from a couple of days ago:

254816	.git/objects
307056	.git/svn/.caches
332452	.git/svn
588064	.git

The actual .git/config is here - this should be sufficient info for
somebody looking into experiencing the issues I mentioned above.

--------
$ more .git/config 
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
[svn-remote "svn"]
	url = https://svn.r-project.org/R
	fetch = trunk:refs/remotes/trunk
	branches = branches/*:refs/remotes/*
	tags = tags/*:refs/remotes/tags/*
[pack]
	threads = 1
------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
  2014-09-18  7:39 git svn's performance issue and strange pauses, and other thing Hin-Tak Leung
@ 2014-09-19  8:25 ` Eric Wong
  2014-09-19 13:44   ` Jakob Stoklund Olesen
  2014-10-05  1:02   ` Eric Wong
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Wong @ 2014-09-19  8:25 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: git, Jakob Stoklund Olesen, Sam Vilain, Steven Walter,
	Peter Baumann, Andrew Myrick

Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> (I am not on the list - please CC)

Done, it is standard practice for git :)

> Thanks for git-svn - I use it instead of subversion itself for many years now.
> 
> Just thought I'd ask/report a few issues I noticed for some time
> now, of tracking development of a particular subversion-based
> development project. Broadly speaking, I think there are 3 problems,
> especially noticeable against a particular repository, but 
> to a lesser extent with some others too.
> 
> - just doing "git svn fetch --all" seems to consume a lot of memory,
> for very little actual fetched changes. (in the 2GB+ region, sometimes).
> 
> - "git svn fetch --all" also seems to take a long time too, for certain
> fetched changes. (in the minutes region).

Jakob sent some patches a few months ago which seem to address the
issue.  Unfortunately we forgot about them :x

Can you take a look at the following two "mergeinfo-speedups"
in my repo?  (git://bogomips.org/git-svn)

Jakob Stoklund Olesen (2):
      git-svn: only look at the new parts of svn:mergeinfo
      git-svn: only look at the root path for svn:mergeinfo

Also downloadable here:

http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a

Can you please give them a try?

> -  I know I can probably just "read the source", but I'd like to know
> why .git/svn/.caches is even larger than .git/objects (which supposedly
> contains everything that's of interest)? I hope this can be documented
> towards the end of the man-page, for example, of important parts
> of .git/svn (and what not to do with them...), without needing to
> 'read the source'. Here is part of "du" from a couple of days ago:
> 
> 254816	.git/objects
> 307056	.git/svn/.caches
> 332452	.git/svn
> 588064	.git
> 
> The actual .git/config is here - this should be sufficient info for
> somebody looking into experiencing the issues I mentioned above.

IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
help, there, too.

Sorry I don't understand the mergeinfo stuff more, I've never worked on
a project which uses it.

> --------
> $ more .git/config 
> [core]
> 	repositoryformatversion = 0
> 	filemode = true
> 	bare = false
> 	logallrefupdates = true
> [svn-remote "svn"]
> 	url = https://svn.r-project.org/R
> 	fetch = trunk:refs/remotes/trunk
> 	branches = branches/*:refs/remotes/*
> 	tags = tags/*:refs/remotes/tags/*
> [pack]
> 	threads = 1
> ------------

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
  2014-09-19  8:25 ` Eric Wong
@ 2014-09-19 13:44   ` Jakob Stoklund Olesen
  2014-10-05  1:02   ` Eric Wong
  1 sibling, 0 replies; 10+ messages in thread
From: Jakob Stoklund Olesen @ 2014-09-19 13:44 UTC (permalink / raw)
  To: Eric Wong
  Cc: Hin-Tak Leung, git@vger•kernel.org, Sam Vilain, Steven Walter,
	Peter Baumann, Andrew Myrick



> On Sep 19, 2014, at 1:25, Eric Wong <normalperson@yhbt•net> wrote:
> 
> Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> 
>> -  I know I can probably just "read the source", but I'd like to know
>> why .git/svn/.caches is even larger than .git/objects (which supposedly
>> contains everything that's of interest)? I hope this can be documented
>> towards the end of the man-page, for example, of important parts
>> of .git/svn (and what not to do with them...), without needing to
>> 'read the source'. Here is part of "du" from a couple of days ago:
>> 
>> 254816    .git/objects
>> 307056    .git/svn/.caches
>> 332452    .git/svn
>> 588064    .git
>> 
>> The actual .git/config is here - this should be sufficient info for
>> somebody looking into experiencing the issues I mentioned above.
> 
> IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
> help, there, too.

IIRC the caches are used for memoization, and with my two patches applied it doesn't improve performance much.

You could try removing the memoization after applying my patches.

Thanks,
/Jakob

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
  2014-09-19  8:25 ` Eric Wong
  2014-09-19 13:44   ` Jakob Stoklund Olesen
@ 2014-10-05  1:02   ` Eric Wong
  1 sibling, 0 replies; 10+ messages in thread
From: Eric Wong @ 2014-10-05  1:02 UTC (permalink / raw)
  To: Hin-Tak Leung
  Cc: git, Jakob Stoklund Olesen, Sam Vilain, Steven Walter,
	Peter Baumann, Andrew Myrick

Eric Wong <normalperson@yhbt•net> wrote:
> Jakob sent some patches a few months ago which seem to address the
> issue.  Unfortunately we forgot about them :x

Hin-Tak: have you tried Jakob's patches?  I've taken another look,
signed-off and pushed to my master.

> Can you take a look at the following two "mergeinfo-speedups"
> in my repo?  (git://bogomips.org/git-svn)
> 
> Jakob Stoklund Olesen (2):
>       git-svn: only look at the new parts of svn:mergeinfo
>       git-svn: only look at the root path for svn:mergeinfo
> 
> Also downloadable here:
> 
> http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
> http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
> 
> Can you please give them a try?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-06 23:51 Hin-Tak Leung
  0 siblings, 0 replies; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-06 23:51 UTC (permalink / raw)
  To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick

------------------------------
On Sun, Oct 5, 2014 02:02 BST Eric Wong wrote:

>Eric Wong <normalperson@yhbt•net> wrote:
>> Jakob sent some patches a few months ago which seem to address the
>> issue.  Unfortunately we forgot about them :x
>
>Hin-Tak: have you tried Jakob's patches?  I've taken another look,
>signed-off and pushed to my master.
>
>> Can you take a look at the following two "mergeinfo-speedups"
>> in my repo?  (git://bogomips.org/git-svn)
>> 
>> Jakob Stoklund Olesen (2):
>>       git-svn: only look at the new parts of svn:mergeinfo
>>       git-svn: only look at the root path for svn:mergeinfo
>> 
>> Also downloadable here:
>> 
>> http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
>> http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
>> 
>> Can you please give them a try?

Apologies - I applied them on top of 2.1.0 earlier today, and the svn repo just
hasn't been changed much recently to show any interesting behavior
with 'git svn fetch --all', so I thought about whether I should wait to report. Then
I changed my mind, and decided what the hell, let's clone the whole
thing again :-). So I made a new directory, run 'git init', just copy
.git/config from the old reop and am doing 'git svn fetch --all' in the new empty
directory again.

So far it seems to be good. But I am only at revision 35700-ish at the moment,
and the whole thing is 66700-ish. Oh, I forgot to mention that the strange
pauses seem to be followed by messages like these:

W:svn cherry-pick ignored (/branches/R-2-12-branch:52939,54476,55265) - missing 492 commit(s) (eg 9bf20dca6a8b05dff28e6486b1613f10825972c9)
W:svn cherry-pick ignored (/branches/R-2-13-branch:55265,55432) - missing 231 commit(s) (eg 9290cf6ce2d7f6cca168cf326eed6e9fe760895f)
W:svn cherry-pick ignored (/branches/R-2-15-branch:58894,59717) - missing 405 commit(s) (eg ed84a373b33f728949edf3371829fc3414c343a8)
W:svn cherry-pick ignored (/branches/R-3-0-branch:62497) - missing 154 commit(s) (eg 9e4742d201771c9658417c2d2f83838e550e3162)
W:svn cherry-pick ignored (/trunk:

So presumably I'd only see interesting behavior when there are a number of branches.
It seems the first branches are around revision 48000-ish, so I might have
to wait a bit.

So far, the new clone hasn't created ".git/svn/.caches/" yet; and memory consumption seems
okay also.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-07 18:20 Hin-Tak Leung
  2014-10-19  4:12 ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-07 18:20 UTC (permalink / raw)
  To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick

------------------------------
On Tue, Oct 7, 2014 00:51 BST Hin-Tak Leung wrote:

>------------------------------
>On Sun, Oct 5, 2014 02:02 BST Eric Wong wrote:

<snipped>
>>Hin-Tak: have you tried Jakob's patches?  I've taken another look,
>>signed-off and pushed to my master.

... Then
>I changed my mind, and decided what the hell, let's clone the whole
>thing again :-). So I made a new directory, run 'git init', just copy
>.git/config from the old reop and am doing 'git svn fetch --all' in the new empty
>directory again.
>
>So far it seems to be good. But I am only at revision 35700-ish at the moment,
>and the whole thing is 66700-ish. Oh, I forgot to mention that the strange
>pauses seem to be followed by messages like these:
>
>W:svn cherry-pick ignored (/branches/R-2-12-branch:52939,54476,55265) - missing 492 commit(s) (eg 9bf20dca6a8b05dff28e6486b1613f10825972c9)
>W:svn cherry-pick ignored (/branches/R-2-13-branch:55265,55432) - missing 231 commit(s) (eg 9290cf6ce2d7f6cca168cf326eed6e9fe760895f)
>W:svn cherry-pick ignored (/branches/R-2-15-branch:58894,59717) - missing 405 commit(s) (eg ed84a373b33f728949edf3371829fc3414c343a8)
>W:svn cherry-pick ignored (/branches/R-3-0-branch:62497) - missing 154 commit(s) (eg 9e4742d201771c9658417c2d2f83838e550e3162)
>W:svn cherry-pick ignored (/trunk:
>
>So presumably I'd only see interesting behavior when there are a number of branches.
>It seems the first branches are around revision 48000-ish, so I might have
>to wait a bit.
>
>So far, the new clone hasn't created ".git/svn/.caches/" yet; and memory consumption seems
>okay also.

The changes definitely improve, as far as my impression goes. There was only one notable pause around
r50651, and it is probably because the rather large "Checking svn:mergeinfo changes since r15413"
from r15413? That took about 12 minutes. Other instances of "W:svn cherry-pick ignored"
though do take a while, are in the seconds region - before the code changes they could
be minutes, if memory serves.

<--
	M	src/library/tools/R/toHTML.R
r50650 = bed91d435c535f2643cf0d48623fecf86d264bd9 (refs/remotes/trunk)
	M	src/modules/X11/rotated.c
	M	src/modules/X11/dataentry.c
Checking svn:mergeinfo changes since r15413: 1 sources, 1 changed
W:svn cherry-pick ignored (/trunk:28840) - missing 9372 commit(s) (eg cea6142c76300539a0d0c9c743738e31a9f7d523)
r50651 = ad139a5bf91f9ad6690ff5fb4a3f71cea591a944 (refs/remotes/R-uthreads)
-->

The new clone has:

<--
$ ls -ltr .git/svn/.caches/
total 144788
-rw-rw-r--. 1 Hin-Tak Hin-Tak  1166138 Oct  7 13:44 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct  7 13:48 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak  1133855 Oct  7 13:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct  7 13:53 _rev_list.yaml
-->

The old clone has:

<---
$ ls -ltr .git/svn/.caches/
total 318824
-rw-rw-r--. 1 Hin-Tak Hin-Tak   5711724 Jul 24  2012 lookup_svn_merge.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak  30523628 Jul 24  2012 check_cherry_pick.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak    296592 Jul 24  2012 has_no_changes.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak  40241189 Oct  5 16:42 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct  5 16:49 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak    242547 Oct  5 16:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak  24120007 Oct  5 16:50 _rev_list.yaml
-->

I had to suspend somewhat around r59000 - but it is interesting to see
that the max memory consumption of the later part is almost double?
and it also runs at 100% rather than 60% overall; I don't know what
to make of that - probably just smaller changes versus
larger ones, or different time of day and network loads (yes,
I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
rather than user).

I am somwhat worry about the dramatic difference between the two .svn/.caches -
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?

<--
	M	src/main/dotcode.c
	M	doc/NEWS.Rd
r59140 = b6014a226aebf9e016c89c0bd1aca1979796a057 (refs/remotes/trunk)
	M	src/main/dotcode.c
	M	doc/NEWS.Rd
Checking svn:mergeinfo changes since r59138: 4 sources, 1 changed
W:svn cherry-pick ignored (/trunk:59137,59140) - missing 369 commit(s) (eg 8a2a36083ba39be27fc9940acc3f51eab6a7a0c3)
r59141 = 38c6d05f164d34e4b5cc545bda387be9d910f748 (refs/remotes/R-2-15-branch)
Connection timed out: Connection timed out at /usr/share/perl5/vendor_perl/Git/SVN/Ra.pm line 290.

Command exited with non-zero status 1
	Command being timed: "git svn fetch --all"
	User time (seconds): 5642.19
	System time (seconds): 23552.44
	Percent of CPU this job got: 57%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 14:06:58
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 349324
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 39
	Minor (reclaiming a frame) page faults: 744713614
	Voluntary context switches: 4761489
	Involuntary context switches: 8595950
	Swaps: 0
	File system inputs: 7712
	File system outputs: 121404296
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 1
-->
<--
	M	src/include/Defn.h
r66719 = 1e3288d3ae4cfb15f6e4e4116f18d38b3efc5bb5 (refs/remotes/trunk)
	M	doc/NEWS.Rd
r66720 = 1c184e5fc2b71a27767215a45a1270f3edbc616f (refs/remotes/trunk)
Checked out HEAD:
  https://svn.r-project.org/R/trunk r66720
creating empty directory: tests/Pkgs/exNSS4/man
	Command being timed: "git svn fetch --all"
	User time (seconds): 2126.00
	System time (seconds): 7852.44
	Percent of CPU this job got: 96%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:52:38
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 755256
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 6
	Minor (reclaiming a frame) page faults: 142730534
	Voluntary context switches: 898725
	Involuntary context switches: 1842056
	Swaps: 0
	File system inputs: 1800
	File system outputs: 28606392
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
-->

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
  2014-10-07 18:20 Hin-Tak Leung
@ 2014-10-19  4:12 ` Eric Wong
  2014-10-19 14:41   ` Jakob Stoklund Olesen
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2014-10-19  4:12 UTC (permalink / raw)
  To: Hin-Tak Leung; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick

Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> The new clone has:
> 
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1166138 Oct  7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct  7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1133855 Oct  7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct  7 13:53 _rev_list.yaml
> -->
> 
> The old clone has:

<snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  40241189 Oct  5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct  5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak    242547 Oct  5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  24120007 Oct  5 16:50 _rev_list.yaml
> -->
> 
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).

git-svn memory usage is insane, and we need to reduce it.
(on Linux, fork() performance is reduced as memory size of the parent
 grows, and I don't think we can easily call vfork() from Perl)

> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?

Calling patterns changed, and it looks like Jakob's changes avoided some
calls.  The main thing to care about:
	Does the repository history look right?

The check_cherry_pick cache can be made smaller, too:
----------------------- 8< -----------------------------
From: Eric Wong <normalperson@yhbt•net>
Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead

We do not need to store entire lists of commits, only the
number of incomplete and the first commit for reference.
This reduces the amount of data we need to store in memory
and on disk stores.

Signed-off-by: Eric Wong <normalperson@yhbt•net>
---
 perl/Git/SVN.pm | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..b2d37cb 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1537,7 +1537,7 @@ sub _rev_list {
 	@rv;
 }
 
-sub check_cherry_pick {
+sub check_cherry_pick2 {
 	my $base = shift;
 	my $tip = shift;
 	my $parents = shift;
@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
 			delete $commits{$commit};
 		}
 	}
-	return (keys %commits);
+	my @k = (keys %commits);
+	return (scalar @k, $k[0]);
 }
 
 sub has_no_changes {
@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
 		mkpath([$cache_path]) unless -d $cache_path;
 
 		my %lookup_svn_merge_cache;
-		my %check_cherry_pick_cache;
+		my %check_cherry_pick2_cache;
 		my %has_no_changes_cache;
 		my %_rev_list_cache;
 
@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
 			LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
 		;
 
-		tie_for_persistent_memoization(\%check_cherry_pick_cache,
-		    "$cache_path/check_cherry_pick");
-		memoize 'check_cherry_pick',
+		tie_for_persistent_memoization(\%check_cherry_pick2_cache,
+		    "$cache_path/check_cherry_pick2");
+		memoize 'check_cherry_pick2',
 			SCALAR_CACHE => 'FAULT',
-			LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
+			LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
 		;
 
 		tie_for_persistent_memoization(\%has_no_changes_cache,
@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
 		$memoized = 0;
 
 		Memoize::unmemoize 'lookup_svn_merge';
-		Memoize::unmemoize 'check_cherry_pick';
+		Memoize::unmemoize 'check_cherry_pick2';
 		Memoize::unmemoize 'has_no_changes';
 		Memoize::unmemoize '_rev_list';
 	}
@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
 		return unless -d $cache_path;
 
 		for my $cache_file (("$cache_path/lookup_svn_merge",
-				     "$cache_path/check_cherry_pick",
+				     "$cache_path/check_cherry_pick", # old
+				     "$cache_path/check_cherry_pick2",
 				     "$cache_path/has_no_changes")) {
 			for my $suffix (qw(yaml db)) {
 				my $file = "$cache_file.$suffix";
@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
 		}
 
 		# double check that there are no missing non-merge commits
-		my (@incomplete) = check_cherry_pick(
+		my ($ninc, $ifirst) = check_cherry_pick2(
 			$merge_base, $merge_tip,
 			$parents,
 			@all_ranges,
 		       );
 
-		if ( @incomplete ) {
-			warn "W:svn cherry-pick ignored ($spec) - missing "
-				.@incomplete." commit(s) (eg $incomplete[0])\n";
+		if ($ninc) {
+			warn "W:svn cherry-pick ignored ($spec) - missing " .
+				"$ninc commit(s) (eg $ifirst)\n";
 		} else {
 			warn
 				"Found merge parent ($spec): ",
-- 
EW

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-19 14:04 Hin-Tak Leung
  0 siblings, 0 replies; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-19 14:04 UTC (permalink / raw)
  To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick






------------------------------
On Sun, Oct 19, 2014 05:12 BST Eric Wong wrote:

>Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> The new clone has:
> 
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1166138 Oct  7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct  7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1133855 Oct  7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct  7 13:53 _rev_list.yaml
> -->
> 
> The old clone has:
>
><snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  40241189 Oct  5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct  5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak    242547 Oct  5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  24120007 Oct  5 16:50 _rev_list.yaml
> -->
> 
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).
>
>git-svn memory usage is insane, and we need to reduce it.
>(on Linux, fork() performance is reduced as memory size of the parent
> grows, and I don't think we can easily call vfork() from Perl)
>
> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
>
>Calling patterns changed, and it looks like Jakob's changes avoided some
>calls.  The main thing to care about:
>	Does the repository history look right?
>
>The check_cherry_pick cache can be made smaller, too:
>----------------------- 8< -----------------------------
>From: Eric Wong <normalperson@yhbt•net>
>Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
>
>We do not need to store entire lists of commits, only the
>number of incomplete and the first commit for reference.
>This reduces the amount of data we need to store in memory
>and on disk stores.
>
>Signed-off-by: Eric Wong <normalperson@yhbt•net>
>---
> perl/Git/SVN.pm | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
>diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>index 25dbcd5..b2d37cb 100644
>--- a/perl/Git/SVN.pm
>+++ b/perl/Git/SVN.pm
>@@ -1537,7 +1537,7 @@ sub _rev_list {
> 	@rv;
> }
> 
>-sub check_cherry_pick {
>+sub check_cherry_pick2 {
> 	my $base = shift;
> 	my $tip = shift;
> 	my $parents = shift;
>@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
> 			delete $commits{$commit};
> 		}
> 	}
>-	return (keys %commits);
>+	my @k = (keys %commits);
>+	return (scalar @k, $k[0]);
> }
> 
> sub has_no_changes {
>@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
> 		mkpath([$cache_path]) unless -d $cache_path;
> 
> 		my %lookup_svn_merge_cache;
>-		my %check_cherry_pick_cache;
>+		my %check_cherry_pick2_cache;
> 		my %has_no_changes_cache;
> 		my %_rev_list_cache;
> 
>@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
> 			LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
> 		;
> 
>-		tie_for_persistent_memoization(\%check_cherry_pick_cache,
>-		    "$cache_path/check_cherry_pick");
>-		memoize 'check_cherry_pick',
>+		tie_for_persistent_memoization(\%check_cherry_pick2_cache,
>+		    "$cache_path/check_cherry_pick2");
>+		memoize 'check_cherry_pick2',
> 			SCALAR_CACHE => 'FAULT',
>-			LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
>+			LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
> 		;
> 
> 		tie_for_persistent_memoization(\%has_no_changes_cache,
>@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
> 		$memoized = 0;
> 
> 		Memoize::unmemoize 'lookup_svn_merge';
>-		Memoize::unmemoize 'check_cherry_pick';
>+		Memoize::unmemoize 'check_cherry_pick2';
> 		Memoize::unmemoize 'has_no_changes';
> 		Memoize::unmemoize '_rev_list';
> 	}
>@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
> 		return unless -d $cache_path;
> 
> 		for my $cache_file (("$cache_path/lookup_svn_merge",
>-				     "$cache_path/check_cherry_pick",
>+				     "$cache_path/check_cherry_pick", # old
>+				     "$cache_path/check_cherry_pick2",
> 				     "$cache_path/has_no_changes")) {
> 			for my $suffix (qw(yaml db)) {
> 				my $file = "$cache_file.$suffix";
>@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
> 		}
> 
> 		# double check that there are no missing non-merge commits
>-		my (@incomplete) = check_cherry_pick(
>+		my ($ninc, $ifirst) = check_cherry_pick2(
> 			$merge_base, $merge_tip,
> 			$parents,
> 			@all_ranges,
> 		       );
> 
>-		if ( @incomplete ) {
>-			warn "W:svn cherry-pick ignored ($spec) - missing "
>-				.@incomplete." commit(s) (eg $incomplete[0])\n";
>+		if ($ninc) {
>+			warn "W:svn cherry-pick ignored ($spec) - missing " .
>+				"$ninc commit(s) (eg $ifirst)\n";
> 		} else {
> 			warn
> 				"Found merge parent ($spec): ",
>-- 
>EW

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-19 14:22 Hin-Tak Leung
  0 siblings, 0 replies; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-19 14:22 UTC (permalink / raw)
  To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick

(sorry about the last blank reply - mobile phone and finger accident...)

------------------------------
On Sun, Oct 19, 2014 05:12 BST Eric Wong wrote:

>Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> The new clone has:
> 
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1166138 Oct  7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct  7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  1133855 Oct  7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct  7 13:53 _rev_list.yaml
> -->
> 
> The old clone has:
>
><snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  40241189 Oct  5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct  5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak    242547 Oct  5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak  24120007 Oct  5 16:50 _rev_list.yaml
> -->
> 
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).
>
>git-svn memory usage is insane, and we need to reduce it.
>(on Linux, fork() performance is reduced as memory size of the parent
> grows, and I don't think we can easily call vfork() from Perl)
>

Yes, I think the memory consumption is a bit crazy. I ran svn fetch on
the old again and it was a bit slow, so I timed the new, and here it is.
For just fetching 45 changes, it took 36 minutes and the memory 
consumption shoots up to over 1GB. (there was one or two mergeinfo
in the middle, not shown).

<---
cd ../R-2/
[Hin-Tak@localhost R-2]$ /usr/bin/time -v git svn fetch --all
	M	src/library/base/R/apply.R
	M	src/library/base/man/apply.Rd
	M	doc/NEWS.Rd
r66721 = e26e52bf4b2cdbe291d5899fd0a449f197aa2133 (refs/remotes/trunk)
...
	M	src/library/tools/R/utils.R
r66765 = c64d1828ada98395892529ce59b5760de1bdc60b (refs/remotes/R-3-1-branch)
---
	Command being timed: "git svn fetch --all"
	User time (seconds): 2042.81
	System time (seconds): 115.98
	Percent of CPU this job got: 99%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 36:13.74
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 1019092
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1149
	Minor (reclaiming a frame) page faults: 1482219
	Voluntary context switches: 9470
	Involuntary context switches: 226683
	Swaps: 0
	File system inputs: 358864
	File system outputs: 510680
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0
[Hin-Tak@localhost R-2]$ cd ../R
--->


> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
>
>Calling patterns changed, and it looks like Jakob's changes avoided some
>calls.  The main thing to care about:
>    Does the repository history look right?
>

I'll check soon and report. I looks superficiently okay. I suppose
I'd need to check every branch to be sure. I know the fetch history is
different - but reflog (or the equivalent of it in svn) expires and are pruned
after two weeks?

>The check_cherry_pick cache can be made smaller, too:
>----------------------- 8< -----------------------------
>From: Eric Wong <normalperson@yhbt•net>
>Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
>
>We do not need to store entire lists of commits, only the
>number of incomplete and the first commit for reference.
>This reduces the amount of data we need to store in memory
>and on disk stores.
>

Is there a way of retrospectively compress/trimming the cache, or better
still, examine it before compressing?

I intend to hold on to both the new and the old clone for a while until
I can reconcil the differences... though I am running the same git svn code
on both now.

>Signed-off-by: Eric Wong <normalperson@yhbt•net>
>---
> perl/Git/SVN.pm | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
>diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>index 25dbcd5..b2d37cb 100644
>--- a/perl/Git/SVN.pm
>+++ b/perl/Git/SVN.pm
>@@ -1537,7 +1537,7 @@ sub _rev_list {
>     @rv;
> }
> 
>-sub check_cherry_pick {
>+sub check_cherry_pick2 {
>     my $base = shift;
>     my $tip = shift;
>     my $parents = shift;
>@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
>             delete $commits{$commit};
>         }
>     }
>-    return (keys %commits);
>+    my @k = (keys %commits);
>+    return (scalar @k, $k[0]);
> }
> 
> sub has_no_changes {
>@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
>         mkpath([$cache_path]) unless -d $cache_path;
> 
>         my %lookup_svn_merge_cache;
>-        my %check_cherry_pick_cache;
>+        my %check_cherry_pick2_cache;
>         my %has_no_changes_cache;
>         my %_rev_list_cache;
> 
>@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
>             LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
>         ;
> 
>-        tie_for_persistent_memoization(\%check_cherry_pick_cache,
>-            "$cache_path/check_cherry_pick");
>-        memoize 'check_cherry_pick',
>+        tie_for_persistent_memoization(\%check_cherry_pick2_cache,
>+            "$cache_path/check_cherry_pick2");
>+        memoize 'check_cherry_pick2',
>             SCALAR_CACHE => 'FAULT',
>-            LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
>+            LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
>         ;
> 
>         tie_for_persistent_memoization(\%has_no_changes_cache,
>@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
>         $memoized = 0;
> 
>         Memoize::unmemoize 'lookup_svn_merge';
>-        Memoize::unmemoize 'check_cherry_pick';
>+        Memoize::unmemoize 'check_cherry_pick2';
>         Memoize::unmemoize 'has_no_changes';
>         Memoize::unmemoize '_rev_list';
>     }
>@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
>         return unless -d $cache_path;
> 
>         for my $cache_file (("$cache_path/lookup_svn_merge",
>-                     "$cache_path/check_cherry_pick",
>+                     "$cache_path/check_cherry_pick", # old
>+                     "$cache_path/check_cherry_pick2",
>                      "$cache_path/has_no_changes")) {
>             for my $suffix (qw(yaml db)) {
>                 my $file = "$cache_file.$suffix";
>@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
>         }
> 
>         # double check that there are no missing non-merge commits
>-        my (@incomplete) = check_cherry_pick(
>+        my ($ninc, $ifirst) = check_cherry_pick2(
>             $merge_base, $merge_tip,
>             $parents,
>             @all_ranges,
>                );
> 
>-        if ( @incomplete ) {
>-            warn "W:svn cherry-pick ignored ($spec) - missing "
>-                .@incomplete." commit(s) (eg $incomplete[0])\n";
>+        if ($ninc) {
>+            warn "W:svn cherry-pick ignored ($spec) - missing " .
>+                "$ninc commit(s) (eg $ifirst)\n";
>         } else {
>             warn
>                 "Found merge parent ($spec): ",
>-- 
>EW

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: git svn's performance issue and strange pauses, and other thing
  2014-10-19  4:12 ` Eric Wong
@ 2014-10-19 14:41   ` Jakob Stoklund Olesen
  0 siblings, 0 replies; 10+ messages in thread
From: Jakob Stoklund Olesen @ 2014-10-19 14:41 UTC (permalink / raw)
  To: Eric Wong
  Cc: Hin-Tak Leung, git@vger•kernel.org, sam@vilain•net,
	stevenrwalter@gmail•com, waste.manager@gmx•de, amyrick@apple•com

On Oct 18, 2014, at 21:12, Eric Wong <normalperson@yhbt•net> wrote:

>> I am somwhat worry about the dramatic difference between the two .svn/.caches -
>> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
>> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
> 
> Calling patterns changed, and it looks like Jakob's changes avoided some
> calls.

It is possible that those functions don't need to be memoized any more. My patch is trying to avoid calling them with the same arguments over and over, and memoizing doesn't help when arguments are changing.

Thanks,
/jakob

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-10-19 14:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-18  7:39 git svn's performance issue and strange pauses, and other thing Hin-Tak Leung
2014-09-19  8:25 ` Eric Wong
2014-09-19 13:44   ` Jakob Stoklund Olesen
2014-10-05  1:02   ` Eric Wong
  -- strict thread matches above, loose matches on Subject: below --
2014-10-06 23:51 Hin-Tak Leung
2014-10-07 18:20 Hin-Tak Leung
2014-10-19  4:12 ` Eric Wong
2014-10-19 14:41   ` Jakob Stoklund Olesen
2014-10-19 14:04 Hin-Tak Leung
2014-10-19 14:22 Hin-Tak Leung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox