* git svn's performance issue and strange pauses, and other thing
@ 2014-09-18 7:39 Hin-Tak Leung
2014-09-19 8:25 ` Eric Wong
0 siblings, 1 reply; 10+ messages in thread
From: Hin-Tak Leung @ 2014-09-18 7:39 UTC (permalink / raw)
To: normalperson, git
(I am not on the list - please CC)
Thanks for git-svn - I use it instead of subversion itself for many years now.
Just thought I'd ask/report a few issues I noticed for some time
now, of tracking development of a particular subversion-based
development project. Broadly speaking, I think there are 3 problems,
especially noticeable against a particular repository, but
to a lesser extent with some others too.
- just doing "git svn fetch --all" seems to consume a lot of memory,
for very little actual fetched changes. (in the 2GB+ region, sometimes).
- "git svn fetch --all" also seems to take a long time too, for certain
fetched changes. (in the minutes region).
- I know I can probably just "read the source", but I'd like to know
why .git/svn/.caches is even larger than .git/objects (which supposedly
contains everything that's of interest)? I hope this can be documented
towards the end of the man-page, for example, of important parts
of .git/svn (and what not to do with them...), without needing to
'read the source'. Here is part of "du" from a couple of days ago:
254816 .git/objects
307056 .git/svn/.caches
332452 .git/svn
588064 .git
The actual .git/config is here - this should be sufficient info for
somebody looking into experiencing the issues I mentioned above.
--------
$ more .git/config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[svn-remote "svn"]
url = https://svn.r-project.org/R
fetch = trunk:refs/remotes/trunk
branches = branches/*:refs/remotes/*
tags = tags/*:refs/remotes/tags/*
[pack]
threads = 1
------------
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
2014-09-18 7:39 git svn's performance issue and strange pauses, and other thing Hin-Tak Leung
@ 2014-09-19 8:25 ` Eric Wong
2014-09-19 13:44 ` Jakob Stoklund Olesen
2014-10-05 1:02 ` Eric Wong
0 siblings, 2 replies; 10+ messages in thread
From: Eric Wong @ 2014-09-19 8:25 UTC (permalink / raw)
To: Hin-Tak Leung
Cc: git, Jakob Stoklund Olesen, Sam Vilain, Steven Walter,
Peter Baumann, Andrew Myrick
Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> (I am not on the list - please CC)
Done, it is standard practice for git :)
> Thanks for git-svn - I use it instead of subversion itself for many years now.
>
> Just thought I'd ask/report a few issues I noticed for some time
> now, of tracking development of a particular subversion-based
> development project. Broadly speaking, I think there are 3 problems,
> especially noticeable against a particular repository, but
> to a lesser extent with some others too.
>
> - just doing "git svn fetch --all" seems to consume a lot of memory,
> for very little actual fetched changes. (in the 2GB+ region, sometimes).
>
> - "git svn fetch --all" also seems to take a long time too, for certain
> fetched changes. (in the minutes region).
Jakob sent some patches a few months ago which seem to address the
issue. Unfortunately we forgot about them :x
Can you take a look at the following two "mergeinfo-speedups"
in my repo? (git://bogomips.org/git-svn)
Jakob Stoklund Olesen (2):
git-svn: only look at the new parts of svn:mergeinfo
git-svn: only look at the root path for svn:mergeinfo
Also downloadable here:
http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
Can you please give them a try?
> - I know I can probably just "read the source", but I'd like to know
> why .git/svn/.caches is even larger than .git/objects (which supposedly
> contains everything that's of interest)? I hope this can be documented
> towards the end of the man-page, for example, of important parts
> of .git/svn (and what not to do with them...), without needing to
> 'read the source'. Here is part of "du" from a couple of days ago:
>
> 254816 .git/objects
> 307056 .git/svn/.caches
> 332452 .git/svn
> 588064 .git
>
> The actual .git/config is here - this should be sufficient info for
> somebody looking into experiencing the issues I mentioned above.
IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
help, there, too.
Sorry I don't understand the mergeinfo stuff more, I've never worked on
a project which uses it.
> --------
> $ more .git/config
> [core]
> repositoryformatversion = 0
> filemode = true
> bare = false
> logallrefupdates = true
> [svn-remote "svn"]
> url = https://svn.r-project.org/R
> fetch = trunk:refs/remotes/trunk
> branches = branches/*:refs/remotes/*
> tags = tags/*:refs/remotes/tags/*
> [pack]
> threads = 1
> ------------
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
2014-09-19 8:25 ` Eric Wong
@ 2014-09-19 13:44 ` Jakob Stoklund Olesen
2014-10-05 1:02 ` Eric Wong
1 sibling, 0 replies; 10+ messages in thread
From: Jakob Stoklund Olesen @ 2014-09-19 13:44 UTC (permalink / raw)
To: Eric Wong
Cc: Hin-Tak Leung, git@vger•kernel.org, Sam Vilain, Steven Walter,
Peter Baumann, Andrew Myrick
> On Sep 19, 2014, at 1:25, Eric Wong <normalperson@yhbt•net> wrote:
>
> Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
>
>> - I know I can probably just "read the source", but I'd like to know
>> why .git/svn/.caches is even larger than .git/objects (which supposedly
>> contains everything that's of interest)? I hope this can be documented
>> towards the end of the man-page, for example, of important parts
>> of .git/svn (and what not to do with them...), without needing to
>> 'read the source'. Here is part of "du" from a couple of days ago:
>>
>> 254816 .git/objects
>> 307056 .git/svn/.caches
>> 332452 .git/svn
>> 588064 .git
>>
>> The actual .git/config is here - this should be sufficient info for
>> somebody looking into experiencing the issues I mentioned above.
>
> IIRC, the caching is unique to mergeinfo, so perhaps Jakob's patches
> help, there, too.
IIRC the caches are used for memoization, and with my two patches applied it doesn't improve performance much.
You could try removing the memoization after applying my patches.
Thanks,
/Jakob
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
2014-09-19 8:25 ` Eric Wong
2014-09-19 13:44 ` Jakob Stoklund Olesen
@ 2014-10-05 1:02 ` Eric Wong
1 sibling, 0 replies; 10+ messages in thread
From: Eric Wong @ 2014-10-05 1:02 UTC (permalink / raw)
To: Hin-Tak Leung
Cc: git, Jakob Stoklund Olesen, Sam Vilain, Steven Walter,
Peter Baumann, Andrew Myrick
Eric Wong <normalperson@yhbt•net> wrote:
> Jakob sent some patches a few months ago which seem to address the
> issue. Unfortunately we forgot about them :x
Hin-Tak: have you tried Jakob's patches? I've taken another look,
signed-off and pushed to my master.
> Can you take a look at the following two "mergeinfo-speedups"
> in my repo? (git://bogomips.org/git-svn)
>
> Jakob Stoklund Olesen (2):
> git-svn: only look at the new parts of svn:mergeinfo
> git-svn: only look at the root path for svn:mergeinfo
>
> Also downloadable here:
>
> http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
> http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
>
> Can you please give them a try?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-06 23:51 Hin-Tak Leung
0 siblings, 0 replies; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-06 23:51 UTC (permalink / raw)
To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
------------------------------
On Sun, Oct 5, 2014 02:02 BST Eric Wong wrote:
>Eric Wong <normalperson@yhbt•net> wrote:
>> Jakob sent some patches a few months ago which seem to address the
>> issue. Unfortunately we forgot about them :x
>
>Hin-Tak: have you tried Jakob's patches? I've taken another look,
>signed-off and pushed to my master.
>
>> Can you take a look at the following two "mergeinfo-speedups"
>> in my repo? (git://bogomips.org/git-svn)
>>
>> Jakob Stoklund Olesen (2):
>> git-svn: only look at the new parts of svn:mergeinfo
>> git-svn: only look at the root path for svn:mergeinfo
>>
>> Also downloadable here:
>>
>> http://bogomips.org/git-svn.git/patch?id=9b258e721b30785357535
>> http://bogomips.org/git-svn.git/patch?id=73409a2145e93b436d74a
>>
>> Can you please give them a try?
Apologies - I applied them on top of 2.1.0 earlier today, and the svn repo just
hasn't been changed much recently to show any interesting behavior
with 'git svn fetch --all', so I thought about whether I should wait to report. Then
I changed my mind, and decided what the hell, let's clone the whole
thing again :-). So I made a new directory, run 'git init', just copy
.git/config from the old reop and am doing 'git svn fetch --all' in the new empty
directory again.
So far it seems to be good. But I am only at revision 35700-ish at the moment,
and the whole thing is 66700-ish. Oh, I forgot to mention that the strange
pauses seem to be followed by messages like these:
W:svn cherry-pick ignored (/branches/R-2-12-branch:52939,54476,55265) - missing 492 commit(s) (eg 9bf20dca6a8b05dff28e6486b1613f10825972c9)
W:svn cherry-pick ignored (/branches/R-2-13-branch:55265,55432) - missing 231 commit(s) (eg 9290cf6ce2d7f6cca168cf326eed6e9fe760895f)
W:svn cherry-pick ignored (/branches/R-2-15-branch:58894,59717) - missing 405 commit(s) (eg ed84a373b33f728949edf3371829fc3414c343a8)
W:svn cherry-pick ignored (/branches/R-3-0-branch:62497) - missing 154 commit(s) (eg 9e4742d201771c9658417c2d2f83838e550e3162)
W:svn cherry-pick ignored (/trunk:
So presumably I'd only see interesting behavior when there are a number of branches.
It seems the first branches are around revision 48000-ish, so I might have
to wait a bit.
So far, the new clone hasn't created ".git/svn/.caches/" yet; and memory consumption seems
okay also.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-07 18:20 Hin-Tak Leung
2014-10-19 4:12 ` Eric Wong
0 siblings, 1 reply; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-07 18:20 UTC (permalink / raw)
To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
------------------------------
On Tue, Oct 7, 2014 00:51 BST Hin-Tak Leung wrote:
>------------------------------
>On Sun, Oct 5, 2014 02:02 BST Eric Wong wrote:
<snipped>
>>Hin-Tak: have you tried Jakob's patches? I've taken another look,
>>signed-off and pushed to my master.
... Then
>I changed my mind, and decided what the hell, let's clone the whole
>thing again :-). So I made a new directory, run 'git init', just copy
>.git/config from the old reop and am doing 'git svn fetch --all' in the new empty
>directory again.
>
>So far it seems to be good. But I am only at revision 35700-ish at the moment,
>and the whole thing is 66700-ish. Oh, I forgot to mention that the strange
>pauses seem to be followed by messages like these:
>
>W:svn cherry-pick ignored (/branches/R-2-12-branch:52939,54476,55265) - missing 492 commit(s) (eg 9bf20dca6a8b05dff28e6486b1613f10825972c9)
>W:svn cherry-pick ignored (/branches/R-2-13-branch:55265,55432) - missing 231 commit(s) (eg 9290cf6ce2d7f6cca168cf326eed6e9fe760895f)
>W:svn cherry-pick ignored (/branches/R-2-15-branch:58894,59717) - missing 405 commit(s) (eg ed84a373b33f728949edf3371829fc3414c343a8)
>W:svn cherry-pick ignored (/branches/R-3-0-branch:62497) - missing 154 commit(s) (eg 9e4742d201771c9658417c2d2f83838e550e3162)
>W:svn cherry-pick ignored (/trunk:
>
>So presumably I'd only see interesting behavior when there are a number of branches.
>It seems the first branches are around revision 48000-ish, so I might have
>to wait a bit.
>
>So far, the new clone hasn't created ".git/svn/.caches/" yet; and memory consumption seems
>okay also.
The changes definitely improve, as far as my impression goes. There was only one notable pause around
r50651, and it is probably because the rather large "Checking svn:mergeinfo changes since r15413"
from r15413? That took about 12 minutes. Other instances of "W:svn cherry-pick ignored"
though do take a while, are in the seconds region - before the code changes they could
be minutes, if memory serves.
<--
M src/library/tools/R/toHTML.R
r50650 = bed91d435c535f2643cf0d48623fecf86d264bd9 (refs/remotes/trunk)
M src/modules/X11/rotated.c
M src/modules/X11/dataentry.c
Checking svn:mergeinfo changes since r15413: 1 sources, 1 changed
W:svn cherry-pick ignored (/trunk:28840) - missing 9372 commit(s) (eg cea6142c76300539a0d0c9c743738e31a9f7d523)
r50651 = ad139a5bf91f9ad6690ff5fb4a3f71cea591a944 (refs/remotes/R-uthreads)
-->
The new clone has:
<--
$ ls -ltr .git/svn/.caches/
total 144788
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
-->
The old clone has:
<---
$ ls -ltr .git/svn/.caches/
total 318824
-rw-rw-r--. 1 Hin-Tak Hin-Tak 5711724 Jul 24 2012 lookup_svn_merge.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak 30523628 Jul 24 2012 check_cherry_pick.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak 296592 Jul 24 2012 has_no_changes.db
-rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yaml
-rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
-->
I had to suspend somewhat around r59000 - but it is interesting to see
that the max memory consumption of the later part is almost double?
and it also runs at 100% rather than 60% overall; I don't know what
to make of that - probably just smaller changes versus
larger ones, or different time of day and network loads (yes,
I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
rather than user).
I am somwhat worry about the dramatic difference between the two .svn/.caches -
check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
_rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
<--
M src/main/dotcode.c
M doc/NEWS.Rd
r59140 = b6014a226aebf9e016c89c0bd1aca1979796a057 (refs/remotes/trunk)
M src/main/dotcode.c
M doc/NEWS.Rd
Checking svn:mergeinfo changes since r59138: 4 sources, 1 changed
W:svn cherry-pick ignored (/trunk:59137,59140) - missing 369 commit(s) (eg 8a2a36083ba39be27fc9940acc3f51eab6a7a0c3)
r59141 = 38c6d05f164d34e4b5cc545bda387be9d910f748 (refs/remotes/R-2-15-branch)
Connection timed out: Connection timed out at /usr/share/perl5/vendor_perl/Git/SVN/Ra.pm line 290.
Command exited with non-zero status 1
Command being timed: "git svn fetch --all"
User time (seconds): 5642.19
System time (seconds): 23552.44
Percent of CPU this job got: 57%
Elapsed (wall clock) time (h:mm:ss or m:ss): 14:06:58
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 349324
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 39
Minor (reclaiming a frame) page faults: 744713614
Voluntary context switches: 4761489
Involuntary context switches: 8595950
Swaps: 0
File system inputs: 7712
File system outputs: 121404296
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
-->
<--
M src/include/Defn.h
r66719 = 1e3288d3ae4cfb15f6e4e4116f18d38b3efc5bb5 (refs/remotes/trunk)
M doc/NEWS.Rd
r66720 = 1c184e5fc2b71a27767215a45a1270f3edbc616f (refs/remotes/trunk)
Checked out HEAD:
https://svn.r-project.org/R/trunk r66720
creating empty directory: tests/Pkgs/exNSS4/man
Command being timed: "git svn fetch --all"
User time (seconds): 2126.00
System time (seconds): 7852.44
Percent of CPU this job got: 96%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:52:38
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 755256
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 6
Minor (reclaiming a frame) page faults: 142730534
Voluntary context switches: 898725
Involuntary context switches: 1842056
Swaps: 0
File system inputs: 1800
File system outputs: 28606392
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
-->
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
2014-10-07 18:20 Hin-Tak Leung
@ 2014-10-19 4:12 ` Eric Wong
2014-10-19 14:41 ` Jakob Stoklund Olesen
0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2014-10-19 4:12 UTC (permalink / raw)
To: Hin-Tak Leung; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> The new clone has:
>
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
> -->
>
> The old clone has:
<snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
> -->
>
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).
git-svn memory usage is insane, and we need to reduce it.
(on Linux, fork() performance is reduced as memory size of the parent
grows, and I don't think we can easily call vfork() from Perl)
> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
Calling patterns changed, and it looks like Jakob's changes avoided some
calls. The main thing to care about:
Does the repository history look right?
The check_cherry_pick cache can be made smaller, too:
----------------------- 8< -----------------------------
From: Eric Wong <normalperson@yhbt•net>
Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
We do not need to store entire lists of commits, only the
number of incomplete and the first commit for reference.
This reduces the amount of data we need to store in memory
and on disk stores.
Signed-off-by: Eric Wong <normalperson@yhbt•net>
---
perl/Git/SVN.pm | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 25dbcd5..b2d37cb 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -1537,7 +1537,7 @@ sub _rev_list {
@rv;
}
-sub check_cherry_pick {
+sub check_cherry_pick2 {
my $base = shift;
my $tip = shift;
my $parents = shift;
@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
delete $commits{$commit};
}
}
- return (keys %commits);
+ my @k = (keys %commits);
+ return (scalar @k, $k[0]);
}
sub has_no_changes {
@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
mkpath([$cache_path]) unless -d $cache_path;
my %lookup_svn_merge_cache;
- my %check_cherry_pick_cache;
+ my %check_cherry_pick2_cache;
my %has_no_changes_cache;
my %_rev_list_cache;
@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
;
- tie_for_persistent_memoization(\%check_cherry_pick_cache,
- "$cache_path/check_cherry_pick");
- memoize 'check_cherry_pick',
+ tie_for_persistent_memoization(\%check_cherry_pick2_cache,
+ "$cache_path/check_cherry_pick2");
+ memoize 'check_cherry_pick2',
SCALAR_CACHE => 'FAULT',
- LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
+ LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
;
tie_for_persistent_memoization(\%has_no_changes_cache,
@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
$memoized = 0;
Memoize::unmemoize 'lookup_svn_merge';
- Memoize::unmemoize 'check_cherry_pick';
+ Memoize::unmemoize 'check_cherry_pick2';
Memoize::unmemoize 'has_no_changes';
Memoize::unmemoize '_rev_list';
}
@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
return unless -d $cache_path;
for my $cache_file (("$cache_path/lookup_svn_merge",
- "$cache_path/check_cherry_pick",
+ "$cache_path/check_cherry_pick", # old
+ "$cache_path/check_cherry_pick2",
"$cache_path/has_no_changes")) {
for my $suffix (qw(yaml db)) {
my $file = "$cache_file.$suffix";
@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
}
# double check that there are no missing non-merge commits
- my (@incomplete) = check_cherry_pick(
+ my ($ninc, $ifirst) = check_cherry_pick2(
$merge_base, $merge_tip,
$parents,
@all_ranges,
);
- if ( @incomplete ) {
- warn "W:svn cherry-pick ignored ($spec) - missing "
- .@incomplete." commit(s) (eg $incomplete[0])\n";
+ if ($ninc) {
+ warn "W:svn cherry-pick ignored ($spec) - missing " .
+ "$ninc commit(s) (eg $ifirst)\n";
} else {
warn
"Found merge parent ($spec): ",
--
EW
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-19 14:04 Hin-Tak Leung
0 siblings, 0 replies; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-19 14:04 UTC (permalink / raw)
To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
------------------------------
On Sun, Oct 19, 2014 05:12 BST Eric Wong wrote:
>Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> The new clone has:
>
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
> -->
>
> The old clone has:
>
><snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
> -->
>
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).
>
>git-svn memory usage is insane, and we need to reduce it.
>(on Linux, fork() performance is reduced as memory size of the parent
> grows, and I don't think we can easily call vfork() from Perl)
>
> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
>
>Calling patterns changed, and it looks like Jakob's changes avoided some
>calls. The main thing to care about:
> Does the repository history look right?
>
>The check_cherry_pick cache can be made smaller, too:
>----------------------- 8< -----------------------------
>From: Eric Wong <normalperson@yhbt•net>
>Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
>
>We do not need to store entire lists of commits, only the
>number of incomplete and the first commit for reference.
>This reduces the amount of data we need to store in memory
>and on disk stores.
>
>Signed-off-by: Eric Wong <normalperson@yhbt•net>
>---
> perl/Git/SVN.pm | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
>diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>index 25dbcd5..b2d37cb 100644
>--- a/perl/Git/SVN.pm
>+++ b/perl/Git/SVN.pm
>@@ -1537,7 +1537,7 @@ sub _rev_list {
> @rv;
> }
>
>-sub check_cherry_pick {
>+sub check_cherry_pick2 {
> my $base = shift;
> my $tip = shift;
> my $parents = shift;
>@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
> delete $commits{$commit};
> }
> }
>- return (keys %commits);
>+ my @k = (keys %commits);
>+ return (scalar @k, $k[0]);
> }
>
> sub has_no_changes {
>@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
> mkpath([$cache_path]) unless -d $cache_path;
>
> my %lookup_svn_merge_cache;
>- my %check_cherry_pick_cache;
>+ my %check_cherry_pick2_cache;
> my %has_no_changes_cache;
> my %_rev_list_cache;
>
>@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
> LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
> ;
>
>- tie_for_persistent_memoization(\%check_cherry_pick_cache,
>- "$cache_path/check_cherry_pick");
>- memoize 'check_cherry_pick',
>+ tie_for_persistent_memoization(\%check_cherry_pick2_cache,
>+ "$cache_path/check_cherry_pick2");
>+ memoize 'check_cherry_pick2',
> SCALAR_CACHE => 'FAULT',
>- LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
>+ LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
> ;
>
> tie_for_persistent_memoization(\%has_no_changes_cache,
>@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
> $memoized = 0;
>
> Memoize::unmemoize 'lookup_svn_merge';
>- Memoize::unmemoize 'check_cherry_pick';
>+ Memoize::unmemoize 'check_cherry_pick2';
> Memoize::unmemoize 'has_no_changes';
> Memoize::unmemoize '_rev_list';
> }
>@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
> return unless -d $cache_path;
>
> for my $cache_file (("$cache_path/lookup_svn_merge",
>- "$cache_path/check_cherry_pick",
>+ "$cache_path/check_cherry_pick", # old
>+ "$cache_path/check_cherry_pick2",
> "$cache_path/has_no_changes")) {
> for my $suffix (qw(yaml db)) {
> my $file = "$cache_file.$suffix";
>@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
> }
>
> # double check that there are no missing non-merge commits
>- my (@incomplete) = check_cherry_pick(
>+ my ($ninc, $ifirst) = check_cherry_pick2(
> $merge_base, $merge_tip,
> $parents,
> @all_ranges,
> );
>
>- if ( @incomplete ) {
>- warn "W:svn cherry-pick ignored ($spec) - missing "
>- .@incomplete." commit(s) (eg $incomplete[0])\n";
>+ if ($ninc) {
>+ warn "W:svn cherry-pick ignored ($spec) - missing " .
>+ "$ninc commit(s) (eg $ifirst)\n";
> } else {
> warn
> "Found merge parent ($spec): ",
>--
>EW
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
@ 2014-10-19 14:22 Hin-Tak Leung
0 siblings, 0 replies; 10+ messages in thread
From: Hin-Tak Leung @ 2014-10-19 14:22 UTC (permalink / raw)
To: normalperson; +Cc: git, stoklund, sam, stevenrwalter, waste.manager, amyrick
(sorry about the last blank reply - mobile phone and finger accident...)
------------------------------
On Sun, Oct 19, 2014 05:12 BST Eric Wong wrote:
>Hin-Tak Leung <htl10@users•sourceforge.net> wrote:
> The new clone has:
>
> <--
> $ ls -ltr .git/svn/.caches/
> total 144788
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 1166138 Oct 7 13:44 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 72849741 Oct 7 13:48 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 1133855 Oct 7 13:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 73109005 Oct 7 13:53 _rev_list.yaml
> -->
>
> The old clone has:
>
><snip>
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 40241189 Oct 5 16:42 lookup_svn_merge.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 225323456 Oct 5 16:49 check_cherry_pick.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 242547 Oct 5 16:49 has_no_changes.yaml
> -rw-rw-r--. 1 Hin-Tak Hin-Tak 24120007 Oct 5 16:50 _rev_list.yaml
> -->
>
> I had to suspend somewhat around r59000 - but it is interesting to see
> that the max memory consumption of the later part is almost double?
> and it also runs at 100% rather than 60% overall; I don't know what
> to make of that - probably just smaller changes versus
> larger ones, or different time of day and network loads (yes,
> I guess it is just bandwidth-limited?, since the bulk of CPU time is in system
> rather than user).
>
>git-svn memory usage is insane, and we need to reduce it.
>(on Linux, fork() performance is reduced as memory size of the parent
> grows, and I don't think we can easily call vfork() from Perl)
>
Yes, I think the memory consumption is a bit crazy. I ran svn fetch on
the old again and it was a bit slow, so I timed the new, and here it is.
For just fetching 45 changes, it took 36 minutes and the memory
consumption shoots up to over 1GB. (there was one or two mergeinfo
in the middle, not shown).
<---
cd ../R-2/
[Hin-Tak@localhost R-2]$ /usr/bin/time -v git svn fetch --all
M src/library/base/R/apply.R
M src/library/base/man/apply.Rd
M doc/NEWS.Rd
r66721 = e26e52bf4b2cdbe291d5899fd0a449f197aa2133 (refs/remotes/trunk)
...
M src/library/tools/R/utils.R
r66765 = c64d1828ada98395892529ce59b5760de1bdc60b (refs/remotes/R-3-1-branch)
---
Command being timed: "git svn fetch --all"
User time (seconds): 2042.81
System time (seconds): 115.98
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 36:13.74
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1019092
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1149
Minor (reclaiming a frame) page faults: 1482219
Voluntary context switches: 9470
Involuntary context switches: 226683
Swaps: 0
File system inputs: 358864
File system outputs: 510680
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
[Hin-Tak@localhost R-2]$ cd ../R
--->
> I am somwhat worry about the dramatic difference between the two .svn/.caches -
> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
>
>Calling patterns changed, and it looks like Jakob's changes avoided some
>calls. The main thing to care about:
> Does the repository history look right?
>
I'll check soon and report. I looks superficiently okay. I suppose
I'd need to check every branch to be sure. I know the fetch history is
different - but reflog (or the equivalent of it in svn) expires and are pruned
after two weeks?
>The check_cherry_pick cache can be made smaller, too:
>----------------------- 8< -----------------------------
>From: Eric Wong <normalperson@yhbt•net>
>Subject: [PATCH] git-svn: reduce check_cherry_pick cache overhead
>
>We do not need to store entire lists of commits, only the
>number of incomplete and the first commit for reference.
>This reduces the amount of data we need to store in memory
>and on disk stores.
>
Is there a way of retrospectively compress/trimming the cache, or better
still, examine it before compressing?
I intend to hold on to both the new and the old clone for a while until
I can reconcil the differences... though I am running the same git svn code
on both now.
>Signed-off-by: Eric Wong <normalperson@yhbt•net>
>---
> perl/Git/SVN.pm | 28 +++++++++++++++-------------
> 1 file changed, 15 insertions(+), 13 deletions(-)
>
>diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
>index 25dbcd5..b2d37cb 100644
>--- a/perl/Git/SVN.pm
>+++ b/perl/Git/SVN.pm
>@@ -1537,7 +1537,7 @@ sub _rev_list {
> @rv;
> }
>
>-sub check_cherry_pick {
>+sub check_cherry_pick2 {
> my $base = shift;
> my $tip = shift;
> my $parents = shift;
>@@ -1552,7 +1552,8 @@ sub check_cherry_pick {
> delete $commits{$commit};
> }
> }
>- return (keys %commits);
>+ my @k = (keys %commits);
>+ return (scalar @k, $k[0]);
> }
>
> sub has_no_changes {
>@@ -1597,7 +1598,7 @@ sub tie_for_persistent_memoization {
> mkpath([$cache_path]) unless -d $cache_path;
>
> my %lookup_svn_merge_cache;
>- my %check_cherry_pick_cache;
>+ my %check_cherry_pick2_cache;
> my %has_no_changes_cache;
> my %_rev_list_cache;
>
>@@ -1608,11 +1609,11 @@ sub tie_for_persistent_memoization {
> LIST_CACHE => ['HASH' => \%lookup_svn_merge_cache],
> ;
>
>- tie_for_persistent_memoization(\%check_cherry_pick_cache,
>- "$cache_path/check_cherry_pick");
>- memoize 'check_cherry_pick',
>+ tie_for_persistent_memoization(\%check_cherry_pick2_cache,
>+ "$cache_path/check_cherry_pick2");
>+ memoize 'check_cherry_pick2',
> SCALAR_CACHE => 'FAULT',
>- LIST_CACHE => ['HASH' => \%check_cherry_pick_cache],
>+ LIST_CACHE => ['HASH' => \%check_cherry_pick2_cache],
> ;
>
> tie_for_persistent_memoization(\%has_no_changes_cache,
>@@ -1636,7 +1637,7 @@ sub tie_for_persistent_memoization {
> $memoized = 0;
>
> Memoize::unmemoize 'lookup_svn_merge';
>- Memoize::unmemoize 'check_cherry_pick';
>+ Memoize::unmemoize 'check_cherry_pick2';
> Memoize::unmemoize 'has_no_changes';
> Memoize::unmemoize '_rev_list';
> }
>@@ -1648,7 +1649,8 @@ sub tie_for_persistent_memoization {
> return unless -d $cache_path;
>
> for my $cache_file (("$cache_path/lookup_svn_merge",
>- "$cache_path/check_cherry_pick",
>+ "$cache_path/check_cherry_pick", # old
>+ "$cache_path/check_cherry_pick2",
> "$cache_path/has_no_changes")) {
> for my $suffix (qw(yaml db)) {
> my $file = "$cache_file.$suffix";
>@@ -1817,15 +1819,15 @@ sub find_extra_svn_parents {
> }
>
> # double check that there are no missing non-merge commits
>- my (@incomplete) = check_cherry_pick(
>+ my ($ninc, $ifirst) = check_cherry_pick2(
> $merge_base, $merge_tip,
> $parents,
> @all_ranges,
> );
>
>- if ( @incomplete ) {
>- warn "W:svn cherry-pick ignored ($spec) - missing "
>- .@incomplete." commit(s) (eg $incomplete[0])\n";
>+ if ($ninc) {
>+ warn "W:svn cherry-pick ignored ($spec) - missing " .
>+ "$ninc commit(s) (eg $ifirst)\n";
> } else {
> warn
> "Found merge parent ($spec): ",
>--
>EW
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: git svn's performance issue and strange pauses, and other thing
2014-10-19 4:12 ` Eric Wong
@ 2014-10-19 14:41 ` Jakob Stoklund Olesen
0 siblings, 0 replies; 10+ messages in thread
From: Jakob Stoklund Olesen @ 2014-10-19 14:41 UTC (permalink / raw)
To: Eric Wong
Cc: Hin-Tak Leung, git@vger•kernel.org, sam@vilain•net,
stevenrwalter@gmail•com, waste.manager@gmx•de, amyrick@apple•com
On Oct 18, 2014, at 21:12, Eric Wong <normalperson@yhbt•net> wrote:
>> I am somwhat worry about the dramatic difference between the two .svn/.caches -
>> check_cherry_pick.yaml is 225MB in one and 73MB in the other, and also
>> _rev_list.yaml is opposite - 24MB vs 73MB. How do I reconcile that?
>
> Calling patterns changed, and it looks like Jakob's changes avoided some
> calls.
It is possible that those functions don't need to be memoized any more. My patch is trying to avoid calling them with the same arguments over and over, and memoizing doesn't help when arguments are changing.
Thanks,
/jakob
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-10-19 14:41 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-18 7:39 git svn's performance issue and strange pauses, and other thing Hin-Tak Leung
2014-09-19 8:25 ` Eric Wong
2014-09-19 13:44 ` Jakob Stoklund Olesen
2014-10-05 1:02 ` Eric Wong
-- strict thread matches above, loose matches on Subject: below --
2014-10-06 23:51 Hin-Tak Leung
2014-10-07 18:20 Hin-Tak Leung
2014-10-19 4:12 ` Eric Wong
2014-10-19 14:41 ` Jakob Stoklund Olesen
2014-10-19 14:04 Hin-Tak Leung
2014-10-19 14:22 Hin-Tak Leung
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox