From: Jakub Kicinski <kuba@kernel•org>
To: Mina Almasry <almasrymina@google•com>
Cc: Stanislav Fomichev <stfomichev@gmail•com>,
Taehee Yoo <ap420073@gmail•com>,
davem@davemloft•net, pabeni@redhat•com, edumazet@google•com,
andrew+netdev@lunn•ch, horms@kernel•org, asml.silence@gmail•com,
dw@davidwei•uk, sdf@fomichev•me, skhawaja@google•com,
simona.vetter@ffwll•ch, kaiyuanz@google•com,
netdev@vger•kernel.org
Subject: Re: [PATCH net] net: devmem: fix kernel panic when socket close after module unload
Date: Wed, 16 Apr 2025 17:27:11 -0700 [thread overview]
Message-ID: <20250416172711.0c6a1da8@kernel.org> (raw)
In-Reply-To: <CAHS8izNUi1v3sjODWYm4jNEb6uOq44RD0d015G=7aXEYMvrinQ@mail.gmail.com>
On Wed, 16 Apr 2025 08:47:14 -0700 Mina Almasry wrote:
> > Right, tho a bit of work and tricky handling will be necessary to get
> > that right. We're not holding a ref on binding->dev.
> >
> > I think we need to invert the socket mutex vs instance lock ordering.
> > Make the priv mutex protect the binding->list and binding->dev.
> > For that to work the binding needs to also store a pointer to its
> > owning socket?
> >
> > Then in both uninstall paths (from socket and from netdev unreg) we can
> > take the socket mutex, delete from list, clear the ->dev pointer,
> > unlock, release the ref on the binding.
>
> I don't like that the ref obtained by the socket can be released by
> both the socket and the netdev unreg :( It creates a weird mental
> model where the ref owned by the socket can actually be dropped by the
> netdev unreg path and then the socket close needs to detect that
> something else dropped its ref. It also creates a weird scenario where
> the device got unregistered and reregistered (I assume that's
> possible? Or no?) and the socket is alive and the device is registered
> but actually the binding is not active.
I agree. But it's be best I could come up with (and what we ended up
with in io-uring)...
> > The socket close path would probably need to lock the socket, look at
> > the first entry, if entry has ->dev call netdev_hold(), release the
> > socket, lock the netdev, lock the socket again, look at the ->dev, if
> > NULL we raced - done. If not NULL release the socket, call unbind.
> > netdev_put(). Restart this paragraph.
> >
> > I can't think of an easier way.
> >
>
> How about, roughly:
>
> - the binding holds a ref on dev, making sure that the dev is alive
> until the last ref on the binding is dropped and the binding is freed.
> - The ref owned by the socket is only ever dropped by the socket close.
> - When we netdev_lock(binding->dev) to later do a
> net_devmem_dmabuf_unbind, we must first grab another ref on the
> binding->dev, so that it doesn't get freed if the unbind drops the
> last ref.
Right now you can't hold a reference on a netdevice forever.
You have to register a notifier and when NETDEV_UNREGISTER triggers
you must give up the reference you took. Also, fun note, it is illegal
to take an "additional reference". You must re-lookup the device or
otherwise safely ensure device is not getting torn down.
See netdev_wait_allrefs_any(), that blocks whoever called unregister
until all refs are reclaimed.
> I think that would work too?
>
> Can you remind me why we do a dev_memory_provider_uninstall on a
> device unregister? If the device gets unregistered then re-registered
> (again, I'm kinda assuming that is possible, I'm not sure)
It's not legal right now. I think there's a BUG_ON() somewhere.
> I expect it
> to still be memory provider bound, because the netlink socket is still
> alive and the userspace is still expecting a live binding. Maybe
> delete the dev_memory_provider_uninstall code I added on unregister,
> and sorry I put it there...? Or is there some reason I'm forgetting
> that we have to uninstall the memory provider on unregister?
IIRC bound_rxqs will point to freed memory once the netdev is gone.
If we had a ref on the netdev then yeah we could possibly potentially
keep the queues around. But holding refs on a netdev is.. a topic for
another time. I'm trying to limit amount of code we'd need to revert
if the instance locking turns out to be fundamentally broken :S
next prev parent reply other threads:[~2025-04-17 0:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-15 9:24 [PATCH net] net: devmem: fix kernel panic when socket close after module unload Taehee Yoo
2025-04-15 17:33 ` Stanislav Fomichev
2025-04-15 18:22 ` Mina Almasry
2025-04-15 18:59 ` Stanislav Fomichev
2025-04-16 2:59 ` Jakub Kicinski
2025-04-16 14:40 ` Stanislav Fomichev
2025-04-17 0:15 ` Jakub Kicinski
2025-04-16 15:01 ` Taehee Yoo
2025-04-17 0:35 ` Jakub Kicinski
2025-04-17 6:57 ` Taehee Yoo
2025-04-17 14:09 ` Jakub Kicinski
2025-04-18 10:46 ` Taehee Yoo
2025-04-16 15:47 ` Mina Almasry
2025-04-17 0:27 ` Jakub Kicinski [this message]
2025-04-17 21:07 ` Mina Almasry
2025-04-18 10:52 ` Taehee Yoo
2025-05-05 17:34 ` Jakub Kicinski
2025-05-06 11:41 ` Taehee Yoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250416172711.0c6a1da8@kernel.org \
--to=kuba@kernel$(echo .)org \
--cc=almasrymina@google$(echo .)com \
--cc=andrew+netdev@lunn$(echo .)ch \
--cc=ap420073@gmail$(echo .)com \
--cc=asml.silence@gmail$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=dw@davidwei$(echo .)uk \
--cc=edumazet@google$(echo .)com \
--cc=horms@kernel$(echo .)org \
--cc=kaiyuanz@google$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=sdf@fomichev$(echo .)me \
--cc=simona.vetter@ffwll$(echo .)ch \
--cc=skhawaja@google$(echo .)com \
--cc=stfomichev@gmail$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox