Hi Stewart, On 04/20/2016 03:41 AM, Stewart Smith wrote: > Akshay Adiga writes: >> Iozone results show fairly consistent performance boost. >> YCSB on redis shows improved Max latencies in most cases. > What about power consumption? > >> Iozone write/rewite test were made with filesizes 200704Kb and 401408Kb >> with different record sizes . The following table shows IOoperations/sec >> with and without patch. >> Iozone Results ( in op/sec) ( mean over 3 iterations ) > What's the variance between runs? Re-Ran Iozone test w/o : without patch, w : with patch , stdev : standard deviation , avg ; average Iozone Results for ReWrite +----------+--------+-----------+------------+-----------+-----------+---------+ | filesize | reclen | w/o(avg) | w/o(stdev) | w(avg) | w(stdev) | change% | +----------+--------+-----------+------------+-----------+-----------+---------+ | 200704 | 1 | 795070.4 | 5813.51 | 805127.8 | 16872.59 | 1.264 | | 200704 | 2 | 1448973.8 | 23058.79 | 1472098.8 | 18062.73 | 1.595 | | 200704 | 4 | 2413444 | 85988.09 | 2562535.8 | 48649.35 | 6.177 | | 200704 | 8 | 3827453 | 87710.52 | 3846888.2 | 86438.51 | 0.507 | | 200704 | 16 | 5276096.8 | 73208.19 | 5425961.6 | 170774.75 | 2.840 | | 200704 | 32 | 6742930.6 | 22789.45 | 6848904.4 | 257768.84 | 1.571 | | 200704 | 64 | 7059479.2 | 300725.26 | 7373635 | 285106.90 | 4.450 | | 200704 | 128 | 7097647.2 | 408171.71 | 7716500 | 266139.68 | 8.719 | | 200704 | 256 | 6710810 | 314594.13 | 7661752.6 | 454049.27 | 14.170 | | 200704 | 512 | 7034675.4 | 516152.97 | 7378583.2 | 613617.57 | 4.888 | | 200704 | 1024 | 6265317.2 | 446101.38 | 7540629.6 | 294865.20 | 20.355 | | 401408 | 1 | 802233.2 | 4263.92 | 817507 | 17727.09 | 1.903 | | 401408 | 2 | 1461892.8 | 53678.12 | 1482872 | 45670.30 | 1.435 | | 401408 | 4 | 2629686.8 | 24365.33 | 2673196.2 | 41576.78 | 1.654 | | 401408 | 8 | 4156353.8 | 70636.85 | 4149330.4 | 56521.84 | -0.168 | | 401408 | 16 | 5895437 | 63762.43 | 5924167.4 | 396311.75 | 0.487 | | 401408 | 32 | 7330826.6 | 167080.53 | 7785889.2 | 245434.99 | 6.207 | | 401408 | 64 | 8298555.2 | 328890.89 | 8482416.8 | 249698.02 | 2.215 | | 401408 | 128 | 8241108.6 | 490560.96 | 8686478 | 224816.21 | 5.404 | | 401408 | 256 | 8038080.6 | 327704.66 | 8372327.4 | 210978.18 | 4.158 | | 401408 | 512 | 8229523.4 | 371701.73 | 8654695.2 | 296715.07 | 5.166 | +----------+--------+-----------+------------+-----------+-----------+---------+ Iozone results for Write +----------+--------+-----------+------------+-----------+------------+---------+ | filesize | reclen | w/o(avg) | w/o(stdev) | w(avg) | w(stdev) | change% | +----------+--------+-----------+------------+-----------+------------+---------+ | 200704 | 1 | 575825 | 7,876.69 | 569388.4 | 6,699.59 | -1.12 | | 200704 | 2 | 1061229.4 | 7,589.50 | 1045193.2 | 19,785.85 | -1.51 | | 200704 | 4 | 1808329 | 13,040.67 | 1798138.4 | 50,367.19 | -0.56 | | 200704 | 8 | 2822953.4 | 19,948.89 | 2830305.6 | 21,202.77 | 0.26 | | 200704 | 16 | 3976987 | 62,201.72 | 3909063.8 | 268,640.51 | -1.71 | | 200704 | 32 | 4959358.2 | 112,052.99 | 4760303 | 330,343.73 | -4.01 | | 200704 | 64 | 5452454.6 | 628,078.72 | 5692265.6 | 190,562.91 | 4.40 | | 200704 | 128 | 5645246.8 | 10,455.85 | 5653330.2 | 18,153.76 | 0.14 | | 200704 | 256 | 5855897.2 | 184,854.25 | 5402069 | 538,523.04 | -7.75 | | 200704 | 512 | 5515904 | 326,198.86 | 5639976.4 | 8,480.46 | 2.25 | | 200704 | 1024 | 5471718.2 | 415,179.15 | 5399414.6 | 686,124.50 | -1.32 | | 401408 | 1 | 584786.6 | 1,256.59 | 587237.2 | 6,552.55 | 0.42 | | 401408 | 2 | 1047018.8 | 26,567.72 | 1040926.8 | 16,495.93 | -0.58 | | 401408 | 4 | 1815465.8 | 16,426.92 | 1773652.6 | 38,169.02 | -2.30 | | 401408 | 8 | 2814285 | 27,374.53 | 2756608 | 96,689.13 | -2.05 | | 401408 | 16 | 3931646 | 129,648.79 | 3805793.4 | 141,368.40 | -3.20 | | 401408 | 32 | 4875353.4 | 146,203.70 | 4884084 | 265,484.01 | 0.18 | | 401408 | 64 | 5479805.8 | 349,995.36 | 5565292.2 | 20,645.45 | 1.56 | | 401408 | 128 | 5598486 | 195,680.23 | 5645125 | 62,017.38 | 0.83 | | 401408 | 256 | 5803148 | 328,683.02 | 5657215 | 20,579.28 | -2.51 | | 401408 | 512 | 5565091.4 | 166,123.57 | 5725974.4 | 169,506.29 | 2.89 | +----------+--------+-----------+------------+-----------+------------+---------+ >> Tested with YCSB workload (50% update + 50% read) over redis for 1 million >> records and 1 million operation. Each test was carried out with target >> operations per second and persistence disabled. >> >> Max-latency (in us)( mean over 5 iterations ) > What's the variance between runs? > > std dev? 95th percentile? > >> --------------------------------------------------------------- >> op/s Operation with patch without patch %change >> --------------------------------------------------------------- >> 15000 Read 61480.6 50261.4 22.32 > This seems fairly significant regression. Any idea why at 15K op/s > there's such a regression? Just Re-Ran the test for power numbers. Results for YCSB+Redis test. P95 : 95 Percentile P99 : 99 Percentile Power numbers are taken for one run of YCSB+redis test which has 50% Read + 50% Update. Maximum Latency has clearly gone down for all cases will less than 5% increase in power. +------------+----------+--------+------------+---------+---------+----------------+ | Op/sec | Testcase | AvgLat | MaxLat | P95 | P99 | Power | +------------+----------+--------+------------+---------+---------+----------------+ | 15000 | Read | - | - | - | - | - | | w/o patch | Average | 51.8 | 127903.0 | 55.8 | 145.2 | 602.7 | | w/o patch | StdDev | 5.692 | 105355.497 | 11.232 | 2.04 | 5.11 | | with patch | Average | 53.28 | 30834.2 | 72.2 | 151.2 | 629.01 | | with patch | StdDev | 2.348 | 8928.323 | 15.74 | 3.544 | 3.25 | | - |*Change% | 2.86 | -75.89 | 29.39 | 4.13 | 4.36535589846* | | 25000 | Read | - | - | - | - | - | | w/o patch | Average | 53.78 | 123743.0 | 85.4 | 152.2 | 617.95 | | w/o patch | StdDev | 4.593 | 80224.53 | 5.886 | 4.49 | 1.32 | | with patch | Average | 49.65 | 84101.4 | 84.2 | 154.4 | 651.64 | | with patch | StdDev | 1.658 | 72656.042 | 4.261 | 2.332 | 8.76 | | - |*Change% | -7.68 | -32.04 | -1.41 | 1.45 | 5.4518974027 * | | 35000 | Read | - | - | - | - | - | | w/o patch | Average | 56.07 | 57391.0 | 93.0 | 147.6 | 636.39 | | w/o patch | StdDev | 1.391 | 34494.839 | 1.789 | 2.871 | 2.92 | | with patch | Average | 56.46 | 39634.2 | 95.0 | 149.2 | 653.44 | | with patch | StdDev | 3.174 | 6089.848 | 3.347 | 3.37 | 4.4 | | - |*Change% | 0.69 | -30.94 | 2.15 | 1.08 | 2.6791747199 * | | 40000 | Read | - | - | - | - | - | | w/o patch | Average | 58.6 | 80427.8 | 97.2 | 147.4 | 636.85 | | w/o patch | StdDev | 1.105 | 59327.584 | 0.748 | 2.498 | 1.51 | | with patch | Average | 58.76 | 45291.8 | 97.2 | 149.0 | 656.12 | | with patch | StdDev | 1.675 | 10486.954 | 2.482 | 3.406 | 6.97 | | - |*Change% | 0.27 | -43.69 | 0.0 | 1.09 | 3.0258302583* | | 45000 | Read | - | - | - | - | - | | w/o patch | Average | 69.02 | 120027.8 | 102.6 | 149.6 | 640.68 | | w/o patch | StdDev | 0.74 | 96288.811 | 1.855 | 1.497 | 7.65 | | with patch | Average | 69.65 | 98024.6 | 102.0 | 147.8 | 653.09 | | with patch | StdDev | 1.14 | 78041.439 | 2.28 | 1.939 | 3.91 | | -*| Change% | 0.92 | -18.33 | -0.58 | -1.2 | 1.93700443279* | | 15000 | Update | - | - | - | - | - | | w/o patch | Average | 48.144 | 86847.0 | 52.4 | 189.2 | 602.7 | | w/o patch | StdDev | 5.971 | 41580.919 | 16.427 | 8.376 | 5.11 | | with patch | Average | 47.964 | 31106.2 | 58.4 | 182.2 | 629.01 | | with patch | StdDev | 3.003 | 4906.179 | 7.088 | 6.177 | 3.25 | | - |*Change% | -0.37 | -64.18 | 11.45 | -3.7 | -3.69978858351* | | 25000 | Update | - | - | - | - | - | | w/o patch | Average | 51.856 | 102808.6 | 87.0 | 182.4 | 617.95 | | w/o patch | StdDev | 5.721 | 79308.823 | 4.899 | 7.965 | 1.32 | | with patch | Average | 46.07 | 74623.0 | 86.2 | 183.0 | 651.64 | | with patch | StdDev | 1.779 | 77511.229 | 4.069 | 7.014 | 8.76 | | - |*Change% | -11.16 | -27.42 | -0.92 | 0.33 | 0.328947368421* | | 35000 | Update | - | - | - | - | - | | w/o patch | Average | 54.142 | 51074.2 | 93.6 | 181.8 | 636.39 | | w/o patch | StdDev | 1.671 | 36877.588 | 1.497 | 8.035 | 2.92 | | with patch | Average | 54.034 | 44731.8 | 94.4 | 184.4 | 653.44 | | with patch | StdDev | 3.363 | 13400.4 | 1.02 | 7.172 | 4.4 | | - |*Change% | -0.2 | -12.42 | 0.85 | 1.43 | 1.4301430143* | | 40000 | Update | - | - | - | - | - | | w/o patch | Average | 57.528 | 71672.6 | 98.4 | 184.8 | 636.85 | | w/o patch | StdDev | 1.111 | 63103.862 | 1.744 | 9.282 | 1.51 | | with patch | Average | 57.738 | 32101.4 | 98.0 | 186.4 | 656.12 | | with patch | StdDev | 1.294 | 4481.801 | 1.673 | 7.71 | 6.97 | | - |*Change% | 0.37 | -55.21 | -0.41 | 0.87 | 0.865800865801 *| | 45000 | Update | - | - | - | - | - | | w/o patch | Average | 69.97 | 117183.0 | 105.4 | 182.4 | 640.68 | | w/o patch | StdDev | 0.925 | 99836.076 | 1.2 | 9.091 | 7.65 | | with patch | Average | 70.508 | 104175.0 | 103.2 | 185.4 | 653.09 | | with patch | StdDev | 1.463 | 74438.13 | 1.47 | 7.915 | 3.91 | | - |*Change% | 0.77 | -11.1 | -2.09 | 1.64 | 1.64473684211 *| +------------+----------+--------+------------+---------+---------+----------------+ >> --- a/drivers/cpufreq/powernv-cpufreq.c >> +++ b/drivers/cpufreq/powernv-cpufreq.c >> @@ -36,12 +36,56 @@ >> #include >> #include /* Required for cpu_sibling_mask() in UP configs */ >> #include >> +#include >> >> #define POWERNV_MAX_PSTATES 256 >> #define PMSR_PSAFE_ENABLE (1UL << 30) >> #define PMSR_SPR_EM_DISABLE (1UL << 31) >> #define PMSR_MAX(x) ((x >> 32) & 0xFF) >> >> +#define MAX_RAMP_DOWN_TIME 5120 >> +/* >> + * On an idle system we want the global pstate to ramp-down from max value to >> + * min over a span of ~5 secs. Also we want it to initially ramp-down slowly and >> + * then ramp-down rapidly later on. > Where does 5 seconds come from? > > Why 5 and not 10, or not 2? Is there some time period inherit in > hardware or software that this is computed from? As global pstates are per-chip and there are max 12 cores, so if the system is really idle, considering 5 seconds for each cores, it should take 60 seconds for the chip to go to pmin. >> +/* Interval after which the timer is queued to bring down global pstate */ >> +#define GPSTATE_TIMER_INTERVAL 2000 > in ms? Yes its 2000 ms.