* Re: Strange delays / what usually happens every 10 min?
[not found] <fhcc39$51b$1@ger.gmane.org>
@ 2007-11-13 16:23 ` Eric Dumazet
2007-11-13 17:54 ` Florian Boelstler
0 siblings, 1 reply; 2+ messages in thread
From: Eric Dumazet @ 2007-11-13 16:23 UTC (permalink / raw)
To: Florian Boelstler; +Cc: linux-kernel, netdev
Florian Boelstler a écrit :
> Hi,
>
> this issue has been already discussed on the kernelnewbies mailing
> list [1],[2] and suggested to be further discussed here.
>
> I am currently working on a MPC8540-based custom board, which runs Linux
> 2.6.15 (arch/ppc). The original Linux sources have been modified to
> support that custom board. (Additional patches to support LTT are
> applied as well, though disabled in the running kernel)
>
> I set up a periodically running kernel thread, which is delayed for a
> single jiffy using schedule_timeout() in an infinite loop. It is used to
> measure delays between invocations of that thread. For measuring the
> distance in time the PPC's time base lower half register is used
> (obtained using get_cycles() defined in asm/timex.h).
>
> The thread calculates the delay to the previous run and only outputs the
> result if a new maximum value has been determined (in respect to all
> previous cycles). Further the thread outputs a warning if a very "high"
> delay was determined. I.e. a delay greater than 5ms.
>
> While running that test driver a delay of about 10ms _exactly_ occurs
> every 10 minutes.
>
> The kernel is configured using CONFIG_HZ=1000 and CONFIG_PREEMPT.
> The CCB is at 333MHz, whereas the TBR update rate is 333 MHz / 8, i.e.
> 41,625 MHz.
> Kernel configuration as a whole is found here:
> http://nopaste.info/5e4d0283bb.html
>
> And now the funny part starts.
> I got a response from Bruce Rowen on kernelnewbies, telling me that he
> came across the same problem. He increased his AMD-Geode-based
> platform to 1GB of RAM (256MB before) and also hit the
> 10-minutes-issue a few month ago (using Linux 2.6.13).
> Going back to 256MB cured the problem. I did the same thing by
> instructing the boot loader in order to only use 256 MB of RAM
> (instead of 512MB) and yes, the 10-minutes-issue was gone as well.
>
> Apart of some kernel threads almost all user processes have been killed
> during the test. Only SSH and a bash were running (whereas a test with
> network interfaces completely disabled and only operated from a serial
> console turned out the same results).
> The kernel comes with compiled in CIFS support, some kernel debugging
> features like soft-lockup detection and preemption debugging. I.e. ps
> lists the kernel threads ksoftirqd, watchdog, events, khelper, kthread,
> kblockd, pdflush, aio, cifsoplockd and cifsdnotifyd.
>
> An appropriate userspace test tool based on nanosleep() determined the
> same results like the kernel thread:
>
> root@mpc0:/# /tmp/wait.rt
> looping 1 milli seconds nanosleep ...
> 15:26:16: #1 FRAME MAX 1996 us (at 4139773004 ticks)
> 15:26:16: #2 FRAME MAX 2002 us (at 4139856360 ticks)
> 15:26:16: #155 FRAME MAX 2102 us (at 4152597854 ticks)
> 15:41:37: #460398 FRAME MAX 8941 us (at 3813406605 ticks)
> 15:41:37: #460398 FRAME HIGH 8941 us (at 3813406605 ticks)
> 15:51:37: #760394 FRAME MAX 9936 us (at 3018602602 ticks)
> 15:51:37: #760394 FRAME HIGH 9936 us (at 3018602602 ticks)
> 16:01:37: #1060390 FRAME HIGH 9935 us (at 2223798809 ticks)
> 16:11:37: #1360386 FRAME HIGH 9934 us (at 1428994989 ticks)
> 16:21:37: #1660382 FRAME HIGH 9935 us (at 634191241 ticks)
> [...]
>
> Thanks for any help!
>
> Cheers,
>
> Florian
>
> [1] http://thread.gmane.org/gmane.linux.kernel.kernelnewbies/23419
> [2] http://thread.gmane.org/gmane.linux.kernel.kernelnewbies/23426
>
Hi Florian
I think you hit the periodic flush of IP route cache, which is fired
every 600 seconds per default.
(Check /proc/sys/net/ipv4/route/secret_interval )
For a 1GB machine, this hash table is so big that a full scan might take
more than 10 ms, even if empty.
Instead of using less RAM, you could just boot with rhash_entries=1024
to lower the size of this table.
Or just change secret_interval to 2000000 for example (not much more
because * HZ could overflow)
Eric
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Strange delays / what usually happens every 10 min?
2007-11-13 16:23 ` Strange delays / what usually happens every 10 min? Eric Dumazet
@ 2007-11-13 17:54 ` Florian Boelstler
0 siblings, 0 replies; 2+ messages in thread
From: Florian Boelstler @ 2007-11-13 17:54 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev
Hi Eric,
Eric Dumazet wrote:
> Instead of using less RAM, you could just boot with rhash_entries=1024
> to lower the size of this table.
I just tried that and it seems to reduce the scan time. This is the
result for the first 40 minutes of runtime:
root@mpc0:/# /tmp/wait.rt
looping 1 milli seconds nanosleep ...
17:10:11/425384 #1 MAX 1996/83117/-268599896 us/tick/usec (at 2107848557)
17:10:11/427385 #2 MAX 2001/83327/2001 us/tick/usec (at 2107931884)
17:10:11/433534 #5 MAX 2149/89477/2150 us/tick/usec (at 2108187839)
17:27:02/5897 #505291 MAX 2512/104576/2513 us/tick/usec (at 1223589469)
The first ~10ms delay usually occurred after ~15 minutes. So one could
argue that the reported HIGH-value at 17:27:02 (GMT) is the first flush
of IP route cache. And all later flushes weren't longer than 2,5ms.
Thanks to all of you, especially Eric. Now it seems I got an instrument
to lower system response time.
Cheers,
Florian
PS: Unfortunately I had to remove some CC:-entries since the local
firewall seems to not allow anything but NNTP (for gmane) and HTTP.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2007-11-13 17:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fhcc39$51b$1@ger.gmane.org>
2007-11-13 16:23 ` Strange delays / what usually happens every 10 min? Eric Dumazet
2007-11-13 17:54 ` Florian Boelstler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox