public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Flavio Leitner <fbl@redhat•com>
To: Michal Kubecek <mkubecek@suse•cz>
Cc: netdev@vger•kernel.org, Jay Vosburgh <fubar@us•ibm.com>,
	Andy Gospodarek <andy@greyhouse•net>
Subject: Re: [PATCH] bonding: start slaves with link down for ARP monitor
Date: Sat, 14 Apr 2012 01:53:19 -0300	[thread overview]
Message-ID: <20120414015319.11e196d4@asterix.rh> (raw)
In-Reply-To: <94e5ccf29d92f9a4b815f895b6bb8d9f326566cb.1334256203.git.mkubecek@suse.cz>

On Thu, 12 Apr 2012 20:38:09 +0200
Michal Kubecek <mkubecek@suse•cz> wrote:

> Initialize slave device link state as down if ARP monitor
> is active. Also shift initial value of its last_arp_tx so that
> it doesn't immediately cause fake detection of "up" state.
> 
> When ARP monitoring is used, initializing the slave device with
> up link state can cause ARP monitor to detect link failure
> before the device is really up (with igb driver, this can take
> more than two seconds).
> 
> Signed-off-by: Michal Kubecek <mkubecek@suse•cz>
> ---
> 
> When MII monitoring is active for a bond, initial link state of slaves
> is set according to real link state of the corresponding device,
> otherwise it is always set to UP. This makes sense if no monitoring is
> active but with ARP monitoring, it can lead to situations like this:
> 
> [ 1280.431383] bonding: bond0: setting mode to active-backup (1).
> [ 1280.443305] bonding: bond0: adding ARP target 10.11.0.8.
> [ 1280.454079] bonding: bond0: setting arp_validate to all (3).
> [ 1280.465561] bonding: bond0: Setting ARP monitoring interval to 500.
> [ 1280.480366] ADDRCONF(NETDEV_UP): bond0: link is not ready
> [ 1280.491471] bonding: bond0: Adding slave eth1.
> [ 1280.584158] bonding: bond0: making interface eth1 the new active one.
> [ 1280.597274] bonding: bond0: first active interface up!
> [ 1280.607675] bonding: bond0: enslaving eth1 as an active interface with an up link.
> [ 1280.623567] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
> [ 1280.635511] bonding: bond0: Adding slave eth2.
> [ 1280.726423] bonding: bond0: enslaving eth2 as a backup interface with an up link.
> [ 1281.976030] bonding: bond0: link status definitely down for interface eth1, disabling it
> [ 1281.992350] bonding: bond0: making interface eth2 the new active one.
> [ 1282.639276] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> [ 1283.002282] bonding: bond0: link status definitely down for interface eth2, disabling it
> [ 1283.018713] bonding: bond0: now running without any active interface !
> [ 1283.529415] bonding: bond0: link status definitely up for interface eth1.
> [ 1283.543075] bonding: bond0: making interface eth1 the new active one.
> [ 1283.556614] bonding: bond0: first active interface up!
> 
> Here eth1 is enslaved with link state UP but before the device is really
> UP, ARP monitor detects it is actually down (it takes more than two
> seconds and arp_interval was set to 500). This causes a spurious failure
> in logs and in statistics.
> 
> I propose to initialize slaves with DOWN link state if ARP monitor is
> active so that the ARP monitor can switch it to UP when appropriate.
> This also requires adjusting the initial value of last_arp_rx as setting
> it to current jiffies would pretend a packet arrived when slave was
> initialized, leading to DOWN -> UP -> DOWN -> UP sequence.
> 
> ---
>  drivers/net/bonding/bond_main.c |   36 ++++++++++++++++++++++--------------
>  1 files changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 62d2409..c1eda74 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1727,6 +1727,9 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
>  	read_lock(&bond->lock);
>  
>  	new_slave->last_arp_rx = jiffies;
> +	if (bond->params.arp_interval)
> +		new_slave->last_arp_rx -=
> +			(msecs_to_jiffies(bond->params.arp_interval) + 1);


I don't see the point of checking bond->params.arp_interval.
Why not simply:

- 	new_slave->last_arp_rx = jiffies;
+	/* put it behind to avoid fake initial link up detection */
+	new_slave->last_arp_rx = jiffies -
+		 (msecs_to_jiffies(bond->params.arp_interval) + 1);

Other than that, works here.

fbl

  reply	other threads:[~2012-04-14  4:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-12 18:38 [PATCH] bonding: start slaves with link down for ARP monitor Michal Kubecek
2012-04-14  4:53 ` Flavio Leitner [this message]
2012-04-14  5:21   ` Jay Vosburgh
2012-04-14 19:25     ` Michal Kubecek
2012-04-14 19:09   ` Michal Kubecek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120414015319.11e196d4@asterix.rh \
    --to=fbl@redhat$(echo .)com \
    --cc=andy@greyhouse$(echo .)net \
    --cc=fubar@us$(echo .)ibm.com \
    --cc=mkubecek@suse$(echo .)cz \
    --cc=netdev@vger$(echo .)kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox