From: Arvid Brodin <arvid.brodin@enea•com>
To: <netdev@vger•kernel.org>
Cc: arbr <Arvid.Brodin@enea•com>
Subject: Re: bridge: HSR support - possible recursive locking?
Date: Thu, 12 Jan 2012 19:02:23 +0100 [thread overview]
Message-ID: <4F0F202F.8060901@enea.com> (raw)
In-Reply-To: <4F073954.7040001@enea.com>
Arvid Brodin wrote:
> Arvid Brodin wrote:
>>> On Tue, 11 Oct 2011 20:25:08 +0200
>>> Arvid Brodin <arvid.brodin@enea•com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I want to add support for HSR ("High-availability Seamless Redundancy",
>>>> IEC-62439-3) to the bridge code. With HSR, all connected units have two network
>>>> ports and are connected in a ring. All new Ethernet packets are sent on both
>>>> ports (or passed through if the current unit is not the originating unit). The
>>>> same packet is never passed twice. Non-HSR units are not allowed in the ring.
>>>>
>>>> This gives instant, reconfiguration-free failover.
>>>>
> *snip*
>> I need to do two things:
>>
>> 1) Bind two network interfaces into one (say, eth0 & eth1 => hsr0). Frames sent on
>> hsr0 should get an HSR tag (including the correct EtherType) and go out on both
>> eth0 and eth1.
>>
>> 2) Ingress frames on eth0 & eth1, with EtherType 0x88fb, should be captured and
>> handled specially (either received on hsr0 or forwarded to the other bound
>> physical interface).
>>
>
> I'm slowly getting there! :)
>
> But what is net_device->header_ops->rebuild supposed to do?
>
I have a "possible recursive locking" when I send cloned packets, and I can't figure out
why. Here's the stack dump and some debug printouts:
hsr_dev_xmit:286: sent on first slave
=============================================
[ INFO: possible recursive locking detected ]
2.6.37 #43
---------------------------------------------
swapper/0 is trying to acquire lock:
(_xmit_ETHER#2){+.-...}, at: [<901b9aae>] sch_direct_xmit+0x24/0x152
but task is already holding lock:
(_xmit_ETHER#2){+.-...}, at: [<901afc4a>] dev_queue_xmit+0x2ce/0x37c
other info that might help us debug this:
4 locks held by swapper/0:
#0: (&n->timer){+.-...}, at: [<9002b2b4>] run_timer_softirq+0x98/0x184
#1: (rcu_read_lock_bh){.+....}, at: [<901af97c>] dev_queue_xmit+0x0/0x37c
#2: (_xmit_ETHER#2){+.-...}, at: [<901afc4a>] dev_queue_xmit+0x2ce/0x37c
#3: (rcu_read_lock_bh){.+....}, at: [<901af97c>] dev_queue_xmit+0x0/0x37c
stack backtrace:
Call trace:
[<9001c264>] dump_stack+0x18/0x20
[<9003fdbc>] validate_chain+0x40c/0x9ac
[<90040968>] __lock_acquire+0x60c/0x670
[<90041cda>] lock_acquire+0x3a/0x48
[<90216c5c>] _raw_spin_lock+0x20/0x44
[<901b9aae>] sch_direct_xmit+0x24/0x152
[<901afb44>] dev_queue_xmit+0x1c8/0x37c
[<90213090>] nf_hook_xmit+0x8/0xc
[<902130a2>] slave_xmit+0xe/0x10
[<902131d6>] hsr_dev_xmit+0xa6/0xcc
[<901af8c2>] dev_hard_start_xmit+0x382/0x43c
[<901afc64>] dev_queue_xmit+0x2e8/0x37c
[<901dc8a0>] arp_xmit+0x8/0xc
[<901dcf86>] arp_send+0x2a/0x2c
[<901dd978>] arp_solicit+0x110/0x130
[<901b54a4>] neigh_timer_handler+0x1c2/0x206
[<9002b31e>] run_timer_softirq+0x102/0x184
[<90027eb8>] __do_softirq+0x64/0xe0
[<9002804a>] do_softirq+0x26/0x48
[<90028146>] irq_exit+0x2e/0x64
[<90019bae>] do_IRQ+0x46/0x5c
[<90018424>] irq_level0+0x18/0x60
[<902136ae>] rest_init+0x72/0x90
[<9000063c>] start_kernel+0x21c/0x258
[<00000000>] 0x0
hsr_dev_xmit:289: sent on second slave
The code looks like this (from my hsr_dev_xmit() function):
...
skb2 = skb_clone(skb, GFP_ATOMIC);
slave_xmit(skb, hsr_priv->slave_data[0].dev);
printk(KERN_INFO "%s:%d: sent on first slave\n", __func__, __LINE__);
if (skb2)
slave_xmit(skb2, hsr_priv->slave_data[1].dev);
printk(KERN_INFO "%s:%d: sent on second slave\n", __func__, __LINE__);
...
and slave_xmit looks like this:
int nf_hook_xmit(struct sk_buff *skb)
{
dev_queue_xmit(skb);
return 0;
}
static int slave_xmit(struct sk_buff *skb, struct net_device *dev)
{
int res;
skb->dev = dev;
skb->priority = 1; // FIXME: what does this mean?
res = NF_HOOK(NFPROTO_BRIDGE, NF_BR_POST_ROUTING, skb, NULL, skb->dev, nf_hook_xmit);
// res = dev_queue_xmit(skb);
/* Buffer is consumed on errors too, so nothing to do here, really... */
return res;
}
I believe I'm doing exactly the same thing as the bridging code (but of course I
can't be). So what is it that I'm doing wrong???
--
Arvid Brodin
Enea Services Stockholm AB
prev parent reply other threads:[~2012-01-12 18:02 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <4E948A04.8060400@enea.com>
[not found] ` <20111011112821.28cd3e51@nehalam.linuxnetplumber.net>
2011-10-11 23:51 ` bridge: HSR support Arvid Brodin
2011-10-12 13:28 ` David Lamparter
2011-10-12 14:24 ` Arvid Brodin
2011-10-24 14:17 ` Arvid Brodin
2011-10-28 15:34 ` Arvid Brodin
2011-10-28 15:54 ` Stephen Hemminger
2011-10-28 16:36 ` Arvid Brodin
2011-12-06 23:23 ` Arvid Brodin
2011-12-06 23:27 ` Stephen Hemminger
2011-12-07 18:30 ` Arvid Brodin
2011-12-07 19:59 ` Jay Vosburgh
2011-12-08 14:45 ` Arvid Brodin
2011-11-21 16:52 ` Arvid Brodin
2012-01-06 18:11 ` Arvid Brodin
2012-01-12 18:02 ` Arvid Brodin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F0F202F.8060901@enea.com \
--to=arvid.brodin@enea$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox