From: Stanislaw Gruszka <stanislaw.gruszka@linux•intel.com>
To: Jakub Kicinski <kuba@kernel•org>
Cc: Heiner Kallweit <hkallweit1@gmail•com>,
Johannes Berg <johannes@sipsolutions•net>,
netdev@vger•kernel.org, Johannes Berg <johannes.berg@intel•com>,
Marc MERLIN <marc@merlins•org>,
Przemek Kitszel <przemyslaw.kitszel@intel•com>
Subject: Re: [PATCH net v3] net: ethtool: do runtime PM outside RTNL
Date: Fri, 5 Jan 2024 12:53:42 +0100 [thread overview]
Message-ID: <ZZftxhXzQykx8j6b@linux.intel.com> (raw)
In-Reply-To: <20240104081656.67c6030c@kernel.org>
On Thu, Jan 04, 2024 at 08:16:56AM -0800, Jakub Kicinski wrote:
> On Thu, 4 Jan 2024 10:05:12 +0100 Heiner Kallweit wrote:
> > > If device was not suspended, pm_runtime_get_sync() will increase
> > > dev->power.usage_count counter and cancel pending rpm suspend
> > > request if any. There is race condition though, more about that
> > > below.
> > >
> > > If device was suspended, we could not get to igc_open() since it
> > > was marked as detached and fail netif_device_present() check in
> > > __dev_open(). That was the behaviour before bd869245a3dc.
>
> __dev_open() tries to resume as well, and is also under rtnl_lock.
This one is plain 100% deadlock for igc (and igb before ac8c58f5b535)
I'm opting for remove those rpm calls from __dev_open() and ethtool.
The only thing that prevent that deadlock to happen all the time,
is that rpm is disabled by default (for PCI devices). When pci driver
want to rpm be default enabled, it has to call pm_runtime_allow().
Otherwise user has to enable it by:
echo auto > /sys/bus/pci/devices/PCI_ID/power/control
But this could be also done by some power saving user-space
software. This is most probable reason way Marc reported
that he can not boot his laptop due to this deadlock.
Other unlikely possibility that for some reason rpm was enabled
by default, but it should not be for PCI since:
bb910a7040e9 ("PCI/PM Runtime: Make runtime PM of PCI devices
inactive by default")
> So that resume call somehow must never happen or users would see
> -ENODEV? Sorry for the basic questions, the flow is confusing :S
If we talk about situation before rpm calls were added to net core
(i.e. < 5.9) there was open/ethtool -ENODEV error when igc/igb
was runtime suspend due to netif_device_present() check fail.
That was by design, what for open the device and loose
energy if there is no cable and device can not be used anyway ?
> > > There is small race window between with igc_open() and scheduled
> > > runtime suspend, if at the same time dev_open() is done and
> > > dev->power.suspend_timer expire:
> > >
> > > open: pm_suspend_timer_fh:
> > >
> > > rtnl_lock()
> > > rpm_suspend()
> > > igc_runtime_suspend()
> > > __igc_shutdown()
> > > rtnl_lock()
> > >
> > > __igc_open()
> > > pm_runtime_get_sync():
> > > waits for rpm suspend callback done
> > >
> > > This needs to be addressed, but it's not that this can happen
> > > all the time. To trigger this someone has to remove the
> > > cable and exactly after 5 seconds do ip link set up.
>
> Or tries to up exactly 5 sec after probe?
Just after probe rpm is disabled, so 5 sec after enabling rpm
(with cable removed) or 5 sec after cable remove (with rpm enabled).
> > For me the main question is the following. In igc_resume() you have
> >
> > rtnl_lock();
> > if (!err && netif_running(netdev))
> > err = __igc_open(netdev, true);
> >
> > if (!err)
> > netif_device_attach(netdev);
> > rtnl_unlock();
> >
> > Why is the global rtnl_lock() needed here? The netdev is in detached
> > state what protects from e.g. userspace activity, see all the
> > netif_device_present() checks in net core.
>
> That'd assume there are no RPM calls outside networking in this driver.
> Perhaps there aren't but that also sounds wobbly.
They are in PCI layer. For example when disabling rpm (reverting auto in
power/control) by:
echo on > /sys/bus/pci/devices/PCI_ID/power/control
Regards
Stanislaw
next prev parent reply other threads:[~2024-01-05 11:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 10:39 [PATCH net v3] net: ethtool: do runtime PM outside RTNL Johannes Berg
2023-12-06 16:44 ` Jakub Kicinski
2023-12-06 16:46 ` Johannes Berg
2023-12-06 21:39 ` Marc MERLIN
2023-12-07 10:16 ` Przemek Kitszel
2023-12-07 17:40 ` Jakub Kicinski
2023-12-11 4:52 ` Marc MERLIN
2023-12-15 13:42 ` Heiner Kallweit
2023-12-15 17:46 ` Marc MERLIN
2023-12-24 16:30 ` Marc MERLIN
2023-12-24 23:12 ` Heiner Kallweit
2023-12-25 8:03 ` [Intel-wired-lan] " Sasha Neftin
2023-12-25 11:21 ` Marc MERLIN
2024-01-03 10:30 ` Stanislaw Gruszka
2024-01-03 11:24 ` Heiner Kallweit
2024-01-03 12:15 ` Stanislaw Gruszka
2024-01-03 23:34 ` Jakub Kicinski
2024-01-04 8:25 ` Stanislaw Gruszka
2024-01-04 9:05 ` Heiner Kallweit
2024-01-04 16:16 ` Jakub Kicinski
2024-01-05 11:53 ` Stanislaw Gruszka [this message]
2024-01-05 15:30 ` Jakub Kicinski
2024-01-05 16:29 ` Stanislaw Gruszka
2024-01-06 3:02 ` Jakub Kicinski
2024-01-08 11:18 ` Stanislaw Gruszka
2024-01-05 10:34 ` Stanislaw Gruszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZftxhXzQykx8j6b@linux.intel.com \
--to=stanislaw.gruszka@linux$(echo .)intel.com \
--cc=hkallweit1@gmail$(echo .)com \
--cc=johannes.berg@intel$(echo .)com \
--cc=johannes@sipsolutions$(echo .)net \
--cc=kuba@kernel$(echo .)org \
--cc=marc@merlins$(echo .)org \
--cc=netdev@vger$(echo .)kernel.org \
--cc=przemyslaw.kitszel@intel$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox