public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jarek Poplawski <jarkao2@gmail•com>
To: Eric Dumazet <eric.dumazet@gmail•com>
Cc: =?ISO-8859-2?Q?Pawe=B3_Staszewski?= <pstaszewski@itcare•pl>,
	"Linux Network Development list" <netdev@vger•kernel.org>
Subject: Re: weird problem
Date: Fri, 26 Jun 2009 09:05:45 +0000	[thread overview]
Message-ID: <20090626090545.GB6445@ff.dom.local> (raw)
In-Reply-To: <20090626083719.GA6445@ff.dom.local>

On Fri, Jun 26, 2009 at 08:37:19AM +0000, Jarek Poplawski wrote:
> On 25-06-2009 22:18, Eric Dumazet wrote:
> > Pawe? Staszewski a ?crit :
> >> Ok
> >>
> >> After this day of observation im near 100% sure that this cpu load is
> >> made by route cahce flushes
> >> When route cache increase to its "net.ipv4.route.gc_thresh" size or is
> >> near that size
> >> system is starting to drop some routes from cache then cpu load is
> >> increase from 2% to near 80%
> >> after cleaning / flush cache when cache is filling cpu load is again
> >> normal 2%
> >>
> >> Someone know how to resolve this ?
> >> on kernels < 2.6.29 i don't see this, all start after upgrade from
> >> 2.6.28 to 2.6.29 - then i try 2.6.29.1 , 2.6.29.3 and 2.6.30 and on all
> >> this kernels >= 2.6.29 problem with cpu load is the same.
> >>
> >> I can minimize this cpu fluctuations by changing of route cache /proc
> >> parameters but the best result for my router was
> >>
> >> 15 sec of 2% cpu
> >> and after
> >> 15sec of 80% cpu
> >>
> >>
> >> Regards
> >> Pawel Staszewski
> > 
> > 
> > I believe this is known 2.6.29 regressions
> > 
> > Following two commits should correct the problem you have
> > 
> > Your best bet would be to try 2.6.31-rc1, and tell us if this recent kernel
> > is ok on your machine ?
> 
> 
> Btw., the first of these commits is in 2.6.30, which according to

And the second as well.

Jarek P.

> Pawel was tried. And IMHO trying -rc1 on a production system needs
> a lot of bravery.
> 
> Jarek P.
> 
> > 
> > 
> > 
> > commit 1ddbcb005c395518c2cd0df504cff3d4b5c85853
> > Author: Eric Dumazet <dada1@cosmosbay•com>
> > Date:   Tue May 19 20:14:28 2009 +0000
> > 
> >     net: fix rtable leak in net/ipv4/route.c
> > 
> >     Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
> >     analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
> >     Quoted here because its a perfect one :
> > 
> >     begin_of_quotation
> >      2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
> >      patch has at least one critical flaw, and another problem.
> > 
> >      rt_intern_hash calculates rthi pointer, which is later used for new entry
> >      insertion. The same loop calculates cand pointer which is used to clean the
> >      list. If the pointers are the same, rtable leak occurs, as first the cand is
> >      removed then the new entry is appended to it.
> > 
> >      This leak leads to unregister_netdevice problem (usage count > 0).
> > 
> >      Another problem of the patch is that it tries to insert the entries in certain
> >      order, to facilitate counting of entries distinct by all but QoS parameters.
> >      Unfortunately, referencing an existing rtable entry moves it to list beginning,
> >      to speed up further lookups, so the carefully built order is destroyed.
> > 
> >      For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
> >      it will also destroy the ordering.
> >     end_of_quotation
> > 
> >     Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b
> >     (net: implement emergency route cache rebulds when gc_elasticity is exceeded)
> > 
> >     Trying to keep dst_entries ordered is too complex and breaks the fact that
> >     order should depend on the frequency of use for garbage collection.
> > 
> >     A possible fix is to make rt_intern_hash() simpler, and only makes
> >     rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
> >     entries order. The added loop is running on cache hot data, while cpu
> >     is prefetching next object, so should be unnoticied.
> > 
> >     Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar•ru>
> > 
> > commit cf8da764fc6959b7efb482f375dfef9830e98205
> > Author: Eric Dumazet <dada1@cosmosbay•com>
> > Date:   Tue May 19 18:54:22 2009 +0000
> > 
> >     net: fix length computation in rt_check_expire()
> > 
> >     rt_check_expire() computes average and standard deviation of chain lengths,
> >     but not correclty reset length to 0 at beginning of each chain.
> >     This probably gives overflows for sum2 (and sum) on loaded machines instead
> >     of meaningful results.
> > 
> >     Signed-off-by: Eric Dumazet <dada1@cosmosbay•com>
> >     Acked-by: Neil Horman <nhorman@tuxdriver•com>
> >     Signed-off-by: David S. Miller <davem@davemloft•net>

  reply	other threads:[~2009-06-26  9:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-25 16:06 weird problem Paweł Staszewski
2009-06-25 16:33 ` Paweł Staszewski
2009-06-25 17:18   ` Paweł Staszewski
2009-06-25 19:45     ` Paweł Staszewski
2009-06-25 20:18       ` Eric Dumazet
2009-06-25 22:23         ` Paweł Staszewski
2009-06-26  8:37         ` Jarek Poplawski
2009-06-26  9:05           ` Jarek Poplawski [this message]
2009-06-26 10:19             ` Eric Dumazet
2009-06-26 17:45               ` Paweł Staszewski
2009-06-26 17:57                 ` Paweł Staszewski
2009-06-30  6:40                 ` Jarek Poplawski
2009-06-30  8:35                   ` Paweł Staszewski
2009-06-30  8:36                     ` Paweł Staszewski
2009-07-08 22:34                       ` Jarek Poplawski
2009-07-09 23:14                         ` Paweł Staszewski
2009-07-09 23:59                           ` Paweł Staszewski
2009-07-10 14:47                             ` Jarek Poplawski
2009-07-11  6:24                               ` Jarek Poplawski
2009-07-13 23:26                                 ` Paweł Staszewski
2009-07-14 16:24                                   ` Jarek Poplawski
2009-07-15 20:15                                     ` Paweł Staszewski
2009-07-15 22:43                                       ` Jarek Poplawski
2009-07-16 11:01                                       ` Jarek Poplawski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090626090545.GB6445@ff.dom.local \
    --to=jarkao2@gmail$(echo .)com \
    --cc=eric.dumazet@gmail$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pstaszewski@itcare$(echo .)pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox