From: npiggin@gmail•com (Nicholas Piggin)
To: linux-arm-kernel@lists•infradead.org
Subject: No subject
Date: Tue, 22 Aug 2017 11:38:51 +1000 [thread overview]
Message-ID: <20170822013851.32583-1-npiggin@gmail.com> (raw)
Date: Sun, 20 Aug 2017 13:16:16 +1000
Subject: [PATCH] timers: Fix excessive granularity of new timers after a nohz
idle
When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.
However there are several problems:
- If an existing timer is modified, the base is forwarded only after
the index is calculated.
- The base is not forwarded by add_timer_on.
- There is a window after a timer is restarted from a nohz ide, after
it is marked not-idle and before the timer tick on this CPU, where a
timer may be added but the ancient base does not get forwarded.
These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.
Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.
There is still a problem where the mod_timer optimization where it's
modified with the same expiry time can result in excessive granularity
relative to the new shorter interval. That is not addressed by this
patch because checking base->was_idle would increase overhead and it's
a rather special case (you could reason that the caller should not
expect change in absolute expiry time due to such an operation). So
that is noted as a comment.
As well as fixing the visible RCU softlockup failures, I tested an
idle system (with no lockup watchdogs running) and traced all
non-deferrable timer expiries for 1000s, and analysed wakeup latency
relative to requested latency. 1.0 means we slept for as many jiffies
as requested, 2.0 means we slept 2x the time (this suffers jiffies
round-up skew at low absolute times):
max avg std
upstream 506.0 1.20 4.68
patched 2.0 1.08 0.15
This was noticed due to the lockup detector Kconfig changes dropping it
out of people's .configs. When the lockup detectors are enabled, no CPU
can go idle for longer than 4 seconds, which limits the granularity
errors. Sub-optimal timer behaviour is observable on a smaller scale:
max avg std
upstream 9.0 1.05 0.19
patched 2.0 1.04 0.11
Tested-by: David Miller <davem@davemloft•net>
Signed-off-by: Nicholas Piggin <npiggin@gmail•com>
---
Hi Andrew,
I would have preferred to get comments from the timer maintainers, but
they've been busy or away for the past copule of weeks. Perhaps you
would consider carrying it until then?
Thanks,
Nick
kernel/time/timer.c | 44 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 8 deletions(-)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 8f5d1bf18854..dd7be9fe6839 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -203,6 +203,7 @@ struct timer_base {
bool migration_enabled;
bool nohz_active;
bool is_idle;
+ bool was_idle; /* was it idle since last run/fwded */
DECLARE_BITMAP(pending_map, WHEEL_SIZE);
struct hlist_head vectors[WHEEL_SIZE];
} ____cacheline_aligned;
@@ -856,13 +857,19 @@ get_target_base(struct timer_base *base, unsigned tflags)
static inline void forward_timer_base(struct timer_base *base)
{
- unsigned long jnow = READ_ONCE(jiffies);
+ unsigned long jnow;
/*
- * We only forward the base when it's idle and we have a delta between
- * base clock and jiffies.
+ * We only forward the base when we are idle or have just come out
+ * of idle (was_idle logic), and have a delta between base clock
+ * and jiffies. In the common case, run_timers will take care of it.
*/
- if (!base->is_idle || (long) (jnow - base->clk) < 2)
+ if (likely(!base->was_idle))
+ return;
+
+ jnow = READ_ONCE(jiffies);
+ base->was_idle = base->is_idle;
+ if ((long)(jnow - base->clk) < 2)
return;
/*
@@ -938,6 +945,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
* same array bucket then just return:
*/
if (timer_pending(timer)) {
+ /*
+ * The downside of this optimization is that it can result in
+ * larger granularity than you would get from adding a new
+ * timer with this expiry. Would a timer flag for networking
+ * be appropriate, then we can try to keep expiry of general
+ * timers within ~1/8th of their interval?
+ */
if (timer->expires == expires)
return 1;
@@ -948,6 +962,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
* dequeue/enqueue dance.
*/
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);
clk = base->clk;
idx = calc_wheel_index(expires, clk);
@@ -964,6 +979,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
}
} else {
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);
}
ret = detach_if_pending(timer, base, false);
@@ -991,12 +1007,10 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
raw_spin_lock(&base->lock);
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | base->cpu);
+ forward_timer_base(base);
}
}
- /* Try to forward a stale timer base clock */
- forward_timer_base(base);
-
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
@@ -1112,6 +1126,7 @@ void add_timer_on(struct timer_list *timer, int cpu)
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | cpu);
}
+ forward_timer_base(base);
debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
@@ -1499,8 +1514,10 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
/*
* If we expect to sleep more than a tick, mark the base idle:
*/
- if ((expires - basem) > TICK_NSEC)
+ if ((expires - basem) > TICK_NSEC) {
+ base->was_idle = true;
base->is_idle = true;
+ }
}
raw_spin_unlock(&base->lock);
@@ -1611,6 +1628,17 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h)
{
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);
+ /*
+ * was_idle must be cleared before running timers so that any timer
+ * functions that call mod_timer will not try to forward the base.
+ *
+ * The deferrable base does not do idle tracking at all, so we do
+ * not forward it. This can result in very large variations in
+ * granularity for deferrable timers, but they can be deferred for
+ * long periods due to idle.
+ */
+ base->was_idle = false;
+
__run_timers(base);
if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));
--
2.13.3
next reply other threads:[~2017-08-22 1:38 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-22 1:38 Nicholas Piggin [this message]
-- strict thread matches above, loose matches on Subject: below --
2018-07-06 21:16 No subject Santosh Shilimkar
2018-07-06 21:16 ` Santosh Shilimkar
2018-07-06 21:16 ` Santosh Shilimkar
2018-07-06 21:18 ` Santosh Shilimkar
2018-06-23 21:08 David Lechner
2018-05-04 20:06 Bjorn Helgaas
2018-04-20 8:02 Christoph Hellwig
2018-04-16 1:22 Andrew Worsley
2017-11-30 10:25 Mary Cuevas
2017-06-26 13:16 [PATCH] arm64: use readq() instead of readl() to read 64bit entry_point Luc Van Oostenryck
2017-07-03 23:46 ` No subject Khuong Dinh
2017-06-04 11:59 Yury Norov
[not found] <CAMj-D2DO_CfvD77izsGfggoKP45HSC9aD6auUPAYC9Yeq_aX7w@mail.gmail.com>
2017-05-04 16:44 ` gengdongjiu
2017-01-31 7:58 Andy Gross
2016-12-01 10:00 Ramana Radhakrishnan
2016-11-11 3:38 Chunyan Zhang
2016-09-30 14:37 Maxime Ripard
2016-07-10 9:24 Neil Armstrong
2016-04-22 8:25 Daniel Lezcano
2016-04-22 8:27 ` Daniel Lezcano
2016-04-11 7:51 Paul Walmsley
2015-12-13 21:57 何旦洁
2015-10-27 0:44 xuyiping
[not found] <E1ZqY3A-0004Mt-KH@feisty.vs19.net>
2015-10-26 3:21 ` Jiada Wang
2015-09-01 14:14 Mika Penttilä
2015-09-01 15:22 ` Fabio Estevam
2015-07-22 14:05 Chunfeng Yun
2015-07-15 9:32 Yuan Yao
2015-05-18 20:00 raghu MG
2015-04-21 10:18 Ard Biesheuvel
2015-02-26 16:56 Jorge Ramirez-Ortiz
2015-02-18 16:14 Lee Jones
2014-10-28 14:13 Mark Rutland
2014-09-22 19:41 Santosh Shilimkar
2014-09-22 7:45 Jingchang Lu
2014-07-09 17:49 Sebastian Andrzej Siewior
2014-05-24 1:21 Loc Ho
2014-05-12 16:40 Santosh Shilimkar
2014-05-12 16:38 Santosh Shilimkar
2014-02-22 15:53 Hans de Goede
2014-01-21 4:09 John Tobias
2014-01-16 16:11 Loc Ho
2014-01-16 16:09 Loc Ho
2014-01-13 10:32 Lothar Waßmann
2014-01-13 10:29 Lothar Waßmann
2013-12-12 7:30 Loc Ho
2013-11-01 7:04 Xiubo Li
2013-09-24 3:13 Rohit Vaswani
2013-09-02 17:01 Drasko DRASKOVIC
2013-08-24 9:29 Haojian Zhuang
2013-07-26 10:05 Haojian Zhuang
2013-06-28 5:49 Wang, Yalin
2013-06-19 10:57 Ben Dooks
2013-04-24 18:07 Viral Mehta
2012-12-29 9:17 steve.zhan
2012-11-11 14:16 Sammy Chan
2012-11-08 8:07 Abhimanyu Kapur
2012-10-14 10:05 Alexey Dobriyan
2012-08-27 6:40 Simon Horman
2012-06-21 18:26 Paul Walmsley
2012-06-06 10:33 Sascha Hauer
2012-05-18 12:27 Sascha Hauer
2012-03-20 18:28 John Szakmeister
2011-12-28 14:01 Shawn Guo
2011-12-02 16:01 Will Deacon
2011-11-28 2:35 Jett.Zhou
2011-09-19 1:45 Saleem Abdulrasool
2011-09-15 2:03 Jongpill Lee
2011-07-21 11:12 Padmavathi Venna
2011-06-27 20:47 John Ogness
2011-06-27 20:47 Jongpill Lee
2011-06-13 17:29 Andre Silva
2011-06-05 18:33 Hector Oron
2011-05-17 9:28 Javier Martin
2011-05-13 19:35 Vadim Bendebury
2011-03-01 14:02 Javier Martin
2011-01-13 9:13 Uwe Kleine-König
2010-12-03 1:08 tarek attia
2010-10-08 6:02 Daein Moon
2010-09-09 3:33 tarek attia
2010-06-24 13:48 Uwe Kleine-König
2010-06-07 17:58 Dave Hylands
2010-05-18 10:38 Marek Szyprowski
2010-04-17 21:43 nelakurthi koteswararao
2010-02-25 9:36 Thomas Weber
2009-09-17 9:37 Marc Kleine-Budde
2009-09-07 14:07 Somshekar ChandrashekarKadam
2009-08-25 10:34 Syed Rafiuddin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170822013851.32583-1-npiggin@gmail.com \
--to=npiggin@gmail$(echo .)com \
--cc=linux-arm-kernel@lists$(echo .)infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox