From: Nick Child <nnac123@linux•ibm.com>
To: netdev@vger•kernel.org
Cc: haren@linux•ibm.com, ricklind@us•ibm.com, danymadden@us•ibm.com,
tlfalcon@linux•ibm.com, bjking1@linux•ibm.com,
Nick Child <nnac123@linux•ibm.com>
Subject: [PATCH net 5/5] ibmvnic: Ensure login failure recovery is safe from other resets
Date: Thu, 3 Aug 2023 15:20:10 -0500 [thread overview]
Message-ID: <20230803202010.37149-5-nnac123@linux.ibm.com> (raw)
In-Reply-To: <20230803202010.37149-1-nnac123@linux.ibm.com>
If a login request fails, the recovery process should be protected
against parallel resets. It is a known issue that freeing and
registering CRQ's in quick succession can result in a failover CRQ from
the VIOS. Processing a failover during login recovery is dangerous for
two reasons:
1. This will result in two parallel initialization processes, this can
cause serious issues during login.
2. It is possible that the failover CRQ is received but never executed.
We get notified of a pending failover through a transport event CRQ.
The reset is not performed until a INIT CRQ request is received.
Previously, if CRQ init fails during login recovery, then the ibmvnic
irq is freed and the login process returned error. If failover_pending
is true (a transport event was received), then the ibmvnic device
would never be able to process the reset since it cannot receive the
CRQ_INIT request due to the irq being freed. This leaved the device
in a inoperable state.
Therefore, the login failure recovery process must be hardened against
these possible issues. Possible failovers (due to quick CRQ free and
init) must be avoided and any issues during re-initialization should be
dealt with instead of being propagated up the stack. This logic is
similar to that of ibmvnic_probe().
Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux•ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 67 ++++++++++++++++++++----------
1 file changed, 46 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 8fd9639665a0..77df62511574 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -116,6 +116,7 @@ static void ibmvnic_tx_scrq_clean_buffer(struct ibmvnic_adapter *adapter,
static void free_long_term_buff(struct ibmvnic_adapter *adapter,
struct ibmvnic_long_term_buff *ltb);
static void ibmvnic_disable_irqs(struct ibmvnic_adapter *adapter);
+static void flush_reset_queue(struct ibmvnic_adapter *adapter);
struct ibmvnic_stat {
char name[ETH_GSTRING_LEN];
@@ -1508,7 +1509,7 @@ static const char *adapter_state_to_string(enum vnic_state state)
static int ibmvnic_login(struct net_device *netdev)
{
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
- unsigned long timeout = msecs_to_jiffies(20000);
+ unsigned long flags, timeout = msecs_to_jiffies(20000);
int retry_count = 0;
int retries = 10;
bool retry;
@@ -1590,27 +1591,52 @@ static int ibmvnic_login(struct net_device *netdev)
adapter->init_done_rc = 0;
retry_count++;
release_sub_crqs(adapter, true);
- reinit_init_done(adapter);
- release_crq_queue(adapter);
- /* If we don't sleep here then we risk an unnecessary
- * failover event from the VIOS. This is a known VIOS
- * issue caused by a vnic device freeing and registering
- * a CRQ too quickly.
+ /* Much of this is similar logic as ibmvnic_probe(),
+ * we are essentially re-initializing communication
+ * with the server. We really should not run any
+ * resets/failovers here because this is already a form
+ * of reset and we do not want parallel resets occurring
*/
- msleep(1500);
- rc = init_crq_queue(adapter);
- if (rc) {
- netdev_err(netdev, "login recovery: init CRQ failed %d\n",
- rc);
- return -EIO;
- }
+ do {
+ reinit_init_done(adapter);
+ /* Clear any failovers we got in the previous
+ * pass since we are re-initializing the CRQ
+ */
+ adapter->failover_pending = false;
+ release_crq_queue(adapter);
+ /* If we don't sleep here then we risk an
+ * unnecessary failover event from the VIOS.
+ * This is a known VIOS issue caused by a vnic
+ * device freeing and registering a CRQ too
+ * quickly.
+ */
+ msleep(1500);
+ /* Avoid any resets, since we are currently
+ * resetting.
+ */
+ spin_lock_irqsave(&adapter->rwi_lock, flags);
+ flush_reset_queue(adapter);
+ spin_unlock_irqrestore(&adapter->rwi_lock,
+ flags);
+
+ rc = init_crq_queue(adapter);
+ if (rc) {
+ netdev_err(netdev, "login recovery: init CRQ failed %d\n",
+ rc);
+ return -EIO;
+ }
- rc = ibmvnic_reset_init(adapter, false);
- if (rc) {
- netdev_err(netdev, "login recovery: Reset init failed %d\n",
- rc);
- return -EIO;
- }
+ rc = ibmvnic_reset_init(adapter, false);
+ if (rc)
+ netdev_err(netdev, "login recovery: Reset init failed %d\n",
+ rc);
+ /* IBMVNIC_CRQ_INIT will return EAGAIN if it
+ * fails, since ibmvnic_reset_init will free
+ * irq's in failure, we won't be able to receive
+ * new CRQs so we need to keep trying. probe()
+ * handles this similarly.
+ */
+ } while (rc == -EAGAIN);
}
} while (retry);
@@ -1903,7 +1929,6 @@ static int ibmvnic_open(struct net_device *netdev)
int rc;
ASSERT_RTNL();
-
/* If device failover is pending or we are about to reset, just set
* device state and return. Device operation will be handled by reset
* routine.
--
2.39.3
next prev parent reply other threads:[~2023-08-03 20:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-03 20:20 [PATCH net 1/5] ibmvnic: Enforce stronger sanity checks on login response Nick Child
2023-08-03 20:20 ` [PATCH net 2/5] ibmvnic: Unmap DMA login rsp buffer on send login fail Nick Child
2023-08-05 7:19 ` Simon Horman
2023-08-03 20:20 ` [PATCH net 3/5] ibmvnic: Handle DMA unmapping of login buffs in release functions Nick Child
2023-08-05 7:19 ` Simon Horman
2023-08-03 20:20 ` [PATCH net 4/5] ibmvnic: Do partial reset on login failure Nick Child
2023-08-05 7:20 ` Simon Horman
2023-08-03 20:20 ` Nick Child [this message]
2023-08-05 7:20 ` [PATCH net 5/5] ibmvnic: Ensure login failure recovery is safe from other resets Simon Horman
2023-08-08 2:13 ` Jakub Kicinski
2023-08-05 7:18 ` [PATCH net 1/5] ibmvnic: Enforce stronger sanity checks on login response Simon Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230803202010.37149-5-nnac123@linux.ibm.com \
--to=nnac123@linux$(echo .)ibm.com \
--cc=bjking1@linux$(echo .)ibm.com \
--cc=danymadden@us$(echo .)ibm.com \
--cc=haren@linux$(echo .)ibm.com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=ricklind@us$(echo .)ibm.com \
--cc=tlfalcon@linux$(echo .)ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox