public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux•org.uk>
To: Jijie Shao <shaojijie@huawei•com>
Cc: f.fainelli@gmail•com, Andrew Lunn <andrew@lunn•ch>,
	davem@davemloft•net, edumazet@google•com, hkallweit1@gmail•com,
	kuba@kernel•org, netdev@vger•kernel.org, pabeni@redhat•com,
	"shenjian15@huawei•com" <shenjian15@huawei•com>,
	"liuyonglong@huawei•com" <liuyonglong@huawei•com>,
	wangjie125@huawei•com, chenhao418@huawei•com,
	Hao Lan <lanhao@huawei•com>,
	"wangpeiyang1@huawei•com" <wangpeiyang1@huawei•com>
Subject: Re: [PATCH net-next] net: phy: avoid kernel warning dump when stopping an errored PHY
Date: Mon, 4 Sep 2023 15:42:50 +0100	[thread overview]
Message-ID: <ZPXs6i2S8GSCpVOV@shell.armlinux.org.uk> (raw)
In-Reply-To: <8e7e02d8-2b2a-8619-e607-fbac50706252@huawei.com>

On Mon, Sep 04, 2023 at 05:50:32PM +0800, Jijie Shao wrote:
> Hi all,
> We encountered an issue when resetting our netdevice recently, it seems
> related to this patch.
> 
> During our process, we stop phy first and call phy_start() later.
> phy_check_link_status returns error because it read mdio failed. The
> reason why it happened is that the cmdq is unusable when we reset and we
> can't access to mdio.

Are you suggesting that the sequence is:

phy_stop();
reset netdev
phy_start();

?

Is the reason for doing this because you've already detected an issue
with the hardware, and you're trying to recover it - and before you've
called phy_stop() the hardware is already dead?

If that is the case, I'm not really sure what you expect to happen
here. You've identified a race where the state machine is running in
unison with phy_stop(), but in this circumstance it is also possible
that the state machine could complete executing and have called
phy_error_precise() before phy_stop() has even been called. In that
case, you'll still get a warning-splat on the console from
phy_error_precise().

The only difference is that phy_stop() won't warn.

That all said, this is obviously buggy, because phy_stop() has set
the phydev state to PHY_HALTED and the state machine has unexpectedly
changed its state.

I wonder whether we should be tracking the phy_start/stop state
separately, since we've had issues with phy_stop() warning when an
error has occurred (see commit 59088b5a946e).

Maybe something like this (untested)?

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index df54c137c5f5..d57f6de8a562 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -810,7 +810,8 @@ int phy_start_cable_test(struct phy_device *phydev,
 		goto out;
 	}
 
-	if (phydev->state < PHY_UP ||
+	if (phydev->oper_state != PHY_OPER_STARTED ||
+	    phydev->state < PHY_UP ||
 	    phydev->state > PHY_CABLETEST) {
 		NL_SET_ERR_MSG(extack,
 			       "PHY not configured. Try setting interface up");
@@ -881,7 +882,8 @@ int phy_start_cable_test_tdr(struct phy_device *phydev,
 		goto out;
 	}
 
-	if (phydev->state < PHY_UP ||
+	if (phydev->oper_state != PHY_OPER_STARTED ||
+	    phydev->state < PHY_UP ||
 	    phydev->state > PHY_CABLETEST) {
 		NL_SET_ERR_MSG(extack,
 			       "PHY not configured. Try setting interface up");
@@ -1364,10 +1366,8 @@ void phy_stop(struct phy_device *phydev)
 	struct net_device *dev = phydev->attached_dev;
 	enum phy_state old_state;
 
-	if (!phy_is_started(phydev) && phydev->state != PHY_DOWN &&
-	    phydev->state != PHY_ERROR) {
-		WARN(1, "called from state %s\n",
-		     phy_state_to_str(phydev->state));
+	if (phydev->oper_state != PHY_OPER_STARTED) {
+		WARN(1, "called when not started\n");
 		return;
 	}
 
@@ -1382,6 +1382,7 @@ void phy_stop(struct phy_device *phydev)
 	if (phydev->sfp_bus)
 		sfp_upstream_stop(phydev->sfp_bus);
 
+	phydev->oper_state = PHY_OPER_STOPPED;
 	phydev->state = PHY_HALTED;
 	phy_process_state_change(phydev, old_state);
 
@@ -1411,9 +1412,8 @@ void phy_start(struct phy_device *phydev)
 {
 	mutex_lock(&phydev->lock);
 
-	if (phydev->state != PHY_READY && phydev->state != PHY_HALTED) {
-		WARN(1, "called from state %s\n",
-		     phy_state_to_str(phydev->state));
+	if (phydev->oper_state != PHY_OPER_STOPPED) {
+		WARN(1, "called when not stopped\n");
 		goto out;
 	}
 
@@ -1423,6 +1423,7 @@ void phy_start(struct phy_device *phydev)
 	/* if phy was suspended, bring the physical link up again */
 	__phy_resume(phydev);
 
+	phydev->oper_state = PHY_OPER_STARTED;
 	phydev->state = PHY_UP;
 
 	phy_start_machine(phydev);
@@ -1442,14 +1443,18 @@ void phy_state_machine(struct work_struct *work)
 			container_of(dwork, struct phy_device, state_queue);
 	struct net_device *dev = phydev->attached_dev;
 	bool needs_aneg = false, do_suspend = false;
-	enum phy_state old_state;
+	enum phy_state old_state, state;
 	const void *func = NULL;
 	bool finished = false;
 	int err = 0;
 
 	mutex_lock(&phydev->lock);
 
-	old_state = phydev->state;
+	state = old_state = phydev->state;
+
+	/* If the PHY is stopped, then force state to halted. */
+	if (phydev->oper_state == PHY_OPER_STOPPED)
+		state = PHY_HALTED;
 
 	switch (phydev->state) {
 	case PHY_DOWN:
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5dcab361a220..b128d903adb3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -519,6 +519,11 @@ enum phy_state {
 	PHY_CABLETEST,
 };
 
+enum phy_oper_state {
+	PHY_OPER_STOPPED,
+	PHY_OPER_STARTED,
+};
+
 #define MDIO_MMD_NUM 32
 
 /**
@@ -670,6 +675,7 @@ struct phy_device {
 	int rate_matching;
 
 	enum phy_state state;
+	enum phy_oper_state oper_state;
 
 	u32 dev_flags;
 
@@ -1221,7 +1227,8 @@ int phy_speed_down_core(struct phy_device *phydev);
  */
 static inline bool phy_is_started(struct phy_device *phydev)
 {
-	return phydev->state >= PHY_UP;
+	return phydev->oper_state == PHY_OPER_STARTED &&
+	       phydev->state >= PHY_UP;
 }
 
 void phy_resolve_aneg_pause(struct phy_device *phydev);
-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

  parent reply	other threads:[~2023-09-04 14:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22 15:58 [PATCH net-next] net: phy: avoid kernel warning dump when stopping an errored PHY Russell King (Oracle)
2023-05-22 16:06 ` Russell King (Oracle)
2023-05-22 19:03 ` Florian Fainelli
2023-09-04  9:50   ` Jijie Shao
2023-09-04 13:43     ` Andrew Lunn
2023-09-05  8:49       ` Jijie Shao
2023-09-05 12:09         ` Andrew Lunn
2023-09-05 14:00           ` Russell King (Oracle)
2023-09-05 13:48         ` Russell King (Oracle)
2023-09-05 15:24           ` Russell King (Oracle)
2023-09-06 12:59             ` Andrew Lunn
2023-09-04 14:42     ` Russell King (Oracle) [this message]
2023-09-05  8:59       ` Jijie Shao
2023-05-24  7:30 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZPXs6i2S8GSCpVOV@shell.armlinux.org.uk \
    --to=linux@armlinux$(echo .)org.uk \
    --cc=andrew@lunn$(echo .)ch \
    --cc=chenhao418@huawei$(echo .)com \
    --cc=davem@davemloft$(echo .)net \
    --cc=edumazet@google$(echo .)com \
    --cc=f.fainelli@gmail$(echo .)com \
    --cc=hkallweit1@gmail$(echo .)com \
    --cc=kuba@kernel$(echo .)org \
    --cc=lanhao@huawei$(echo .)com \
    --cc=liuyonglong@huawei$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=pabeni@redhat$(echo .)com \
    --cc=shaojijie@huawei$(echo .)com \
    --cc=shenjian15@huawei$(echo .)com \
    --cc=wangjie125@huawei$(echo .)com \
    --cc=wangpeiyang1@huawei$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox