public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Vlad Buslov <vladbu@nvidia•com>
To: Antoine Tenart <atenart@kernel•org>
Cc: <davem@davemloft•net>, <kuba@kernel•org>, <echaudro@redhat•com>,
	<sbrivio@redhat•com>, <netdev@vger•kernel.org>, <pshelar@ovn•org>
Subject: Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
Date: Thu, 20 Jan 2022 14:58:18 +0200	[thread overview]
Message-ID: <ygnhee52lg2d.fsf@nvidia.com> (raw)
In-Reply-To: <164267447125.4497.8151505359440130213@kwain>

On Thu 20 Jan 2022 at 12:27, Antoine Tenart <atenart@kernel•org> wrote:
> Hello Vlad,
>
> Quoting Vlad Buslov (2022-01-20 08:38:05)
>> On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@kernel•org> wrote:
>> >
>> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> > index 666dd201c3d5..53dbc67e8a34 100644
>> > --- a/drivers/net/vxlan.c
>> > +++ b/drivers/net/vxlan.c
>> > @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>> >                       goto tx_error;
>> >               } else if (err) {
>> >                       if (info) {
>> > +                             struct ip_tunnel_info *unclone;
>> >                               struct in_addr src, dst;
>> >  
>> > +                             unclone = skb_tunnel_info_unclone(skb);
>> > +                             if (unlikely(!unclone))
>> > +                                     goto tx_error;
>> > +
>> 
>> We have been getting memleaks in one of our tests that point to this
>> code (test deletes vxlan device while running traffic redirected by OvS
>> TC at the same time):
>> 
>> unreferenced object 0xffff8882d0114200 (size 256):
>>   comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
>>   hex dump (first 32 bytes):
>>     00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff  .........;......
>>     a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00  .&..............
>>   backtrace:
>>     [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
>>     [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
>>     [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
>>     [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
>>     [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
>>     [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
>>     [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
>>     [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
>>     [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
>>     [<0000000011d3f765>] tcf_classify+0x33d/0x800
>>     [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
>>     [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
>>     [<0000000065d43bd6>] process_backlog+0x2e3/0x710
>>     [<00000000964357ae>] __napi_poll+0x9f/0x560
>>     [<0000000059a93cf6>] net_rx_action+0x357/0xa60
>>     [<00000000766481bc>] __do_softirq+0x282/0x94e
>> 
>> Looking at the code the potential issue seems to be that
>> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
>> increments the refcount with dst_hold() to value 2, then returns it.
>> This seems to imply that caller is expected to release one of the
>> references (second one if for skb), but none of the callers (including
>> original dev_fill_metadata_dst()) do that, so I guess I'm
>> misunderstanding something here.
>> 
>> Any tips or suggestions?
>
> I'd say there is no need to increase the dst refcount here after calling
> metadata_dst_alloc, as the metadata is local to the skb and the dst
> refcount was already initialized to 1. This might be an issue with
> commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
> Pravin, he might recall if there was a reason to increase the refcount.

I tried to remove the dst_hold(), but that caused underflows[0], so I
guess the current reference counting is required at least for some
use-cases.

[0]:

[  118.803011] ------------[ cut here ]------------                                    
[  118.803011] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803019] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803027] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803041] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803046] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803060] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803065] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803078] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803083] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803096] dst_release: dst:000000001fc13e61 refcnt:-2                             
[  118.803920] dst_release underflow                                                   
[  118.803937] WARNING: CPU: 4 PID: 0 at net/core/dst.c:173 dst_release+0x79/0x90                                                                                             
[  118.815961] Modules linked in: act_tunnel_key act_mirred act_skbedit veth vxlan ip6_udp_tunnel udp_tunnel act_gact cls_flower sch_ingress openvswitch nsh nf_conncount mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_connt
rack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill overlay rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs sunrpc ib_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core virtio_net 
net_failover failover i2c_i801 i2c_smbus kvm_intel kvm pcspkr irqbypass crc32_pclmul ghash_clmulni_intel sch_fq_codel drm i2c_core ip_tables crc32c_intel serio_raw fuse [last unloaded: mlxfw]
[  118.829464] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.16.0+ #3                                                                                                           
[  118.830567] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014                                                                                                                                                                                                                              
[  118.832541] RIP: 0010:dst_release+0x79/0x90                                         
[  118.833372] Code: 04 e8 db 14 01 00 8b 4c 24 04 85 c0 74 c7 e9 4d 60 22 00 48 c7 c7 35 00 32 82 89 4c 24 04 c6 05 c5 47 e5 00 01 e8 6f c2 1b 00 <0f> 0b 8b 4c 24 04 eb cb 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40                                                                                                                                       
[  118.836566] RSP: 0018:ffffc90000180ee0 EFLAGS: 00010282                             
[  118.837546] RAX: 0000000000000000 RBX: ffff88813a139ab0 RCX: 0000000000000000                                                                                              
[  118.838912] RDX: 0000000000000102 RSI: ffffffff82286273 RDI: 00000000ffffffff                                                                                              
[  118.840278] RBP: 0000000000000004 R08: 0000000000000015 R09: ffffc90000180e78                                                                                              
[  118.841646] R10: ffffffff825c7000 R11: 0000000000000001 R12: ffff88811d3ab480                                                                                              
[  118.843008] R13: 0000000000000170 R14: 0000000000000000 R15: ffff8882f5a2e1b8                                                                                              
[  118.844371] FS:  0000000000000000(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000                                                                                   
[  118.845968] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                              
[  118.847100] CR2: 00007fa5e5abd7e0 CR3: 0000000175408005 CR4: 0000000000770ee0                                                                                              
[  118.848461] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                              
[  118.849834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                              
[  118.851198] PKRU: 55555554                                                          
[  118.851815] Call Trace:                                                             
[  118.852387]  <IRQ>                                                                  
[  118.852894]  dst_cache_destroy+0x33/0x60                                            
[  118.853728]  dst_destroy+0xaa/0xb0                                                  
[  118.854465]  rcu_core+0x2a3/0x960                                                   
[  118.855197]  __do_softirq+0xf0/0x2f9                                                
[  118.855976]  __irq_exit_rcu+0xcc/0x120                                              
[  118.856765]  sysvec_apic_timer_interrupt+0xa2/0xd0                                  
[  118.857746]  </IRQ>                                                                 
[  118.858270]  <TASK>                                                                 
[  118.858775]  asm_sysvec_apic_timer_interrupt+0x12/0x20                              
[  118.859801] RIP: 0010:native_safe_halt+0xb/0x10                                     
[  118.860731] Code: 7e ff ff ff 7f 5b c3 65 48 8b 04 25 c0 cb 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c3 eb 80 cc eb 07 0f 00 2d 8f f5 4f 00 fb f4 <c3> 0f 1f 40 00 eb 07 0f 00 2d 7f f5 4f 00 f4 c3 cc cc cc cc cc 0f                                                                                                                                       
[  118.864175] RSP: 0018:ffffc9000009fef0 EFLAGS: 00000206                             
[  118.865223] RAX: 0000000000027f6c RBX: 0000000000000004 RCX: 0000000000000000                                                                                              
[  118.866504] RDX: ffff88817c090d50 RSI: ffffffff82286273 RDI: ffffffff8225bf7f                                                                                              
[  118.867774] RBP: ffff8881009d2e80 R08: 0000000000027f6b R09: 0000000000000000                                                                                              
[  118.869051] R10: 00000000fffd3b17 R11: 0000000000000001 R12: 0000000000000000                                                                                              
[  118.870326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000                                                                                              
[  118.871601]  default_idle+0xa/0x10                                                  
[  118.872300]  default_idle_call+0x33/0xe0                                            
[  118.873081]  do_idle+0x208/0x270                                                    
[  118.873741]  cpu_startup_entry+0x19/0x20                                            
[  118.874562]  secondary_startup_64_no_verify+0xc3/0xcb                               
[  118.875604]  </TASK>                                                                
[  118.876140] ---[ end trace dec5061c76371ce7 ]---   

  reply	other threads:[~2022-01-20 12:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
2022-01-20  7:38   ` Vlad Buslov
2022-01-20 10:27     ` Antoine Tenart
2022-01-20 12:58       ` Vlad Buslov [this message]
2022-01-28 17:01         ` Antoine Tenart
2022-01-31 11:26           ` Vlad Buslov
2022-01-31 13:26             ` Antoine Tenart
2022-01-31 14:04               ` Stefano Brivio
2022-01-31 14:42                 ` Antoine Tenart
2022-01-31 17:55               ` Vlad Buslov
2021-03-25 15:35 ` [PATCH net 2/2] geneve: " Antoine Tenart
2021-03-25 20:28 ` [PATCH net 0/2] net: " Stefano Brivio
2021-03-26  0:40 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ygnhee52lg2d.fsf@nvidia.com \
    --to=vladbu@nvidia$(echo .)com \
    --cc=atenart@kernel$(echo .)org \
    --cc=davem@davemloft$(echo .)net \
    --cc=echaudro@redhat$(echo .)com \
    --cc=kuba@kernel$(echo .)org \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pshelar@ovn$(echo .)org \
    --cc=sbrivio@redhat$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox