From: Simon Horman <horms@kernel•org>
To: Dipayaan Roy <dipayanroy@linux•microsoft.com>
Cc: kys@microsoft•com, haiyangz@microsoft•com, wei.liu@kernel•org,
decui@microsoft•com, andrew+netdev@lunn•ch, davem@davemloft•net,
edumazet@google•com, kuba@kernel•org, pabeni@redhat•com,
leon@kernel•org, longli@microsoft•com, kotaranov@microsoft•com,
shradhagupta@linux•microsoft.com, ssengar@linux•microsoft.com,
ernis@linux•microsoft.com, shirazsaleem@microsoft•com,
linux-hyperv@vger•kernel.org, netdev@vger•kernel.org,
linux-kernel@vger•kernel.org, linux-rdma@vger•kernel.org,
dipayanroy@microsoft•com
Subject: Re: [PATCH net-next, v2] net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout
Date: Mon, 2 Mar 2026 11:27:26 +0000 [thread overview]
Message-ID: <aaV0HvxQneKM8p-c@horms.kernel.org> (raw)
In-Reply-To: <aaFShvKnwR5FY8dH@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
On Fri, Feb 27, 2026 at 12:15:02AM -0800, Dipayaan Roy wrote:
> The GF stats periodic query is used as mechanism to monitor HWC health
> check. If this HWC command times out, it is a strong indication that
> the device/SoC is in a faulty state and requires recovery.
>
> Today, when a timeout is detected, the driver marks
> hwc_timeout_occurred, clears cached stats, and stops rescheduling the
> periodic work. However, the device itself is left in the same failing
> state.
>
> Extend the timeout handling path to trigger the existing MANA VF
> recovery service by queueing a GDMA_EQE_HWC_RESET_REQUEST work item.
> This is expected to initiate the appropriate recovery flow by suspende
> resume first and if it fails then trigger a bus rescan.
>
> This change is intentionally limited to HWC command timeouts and does
> not trigger recovery for errors reported by the SoC as a normal command
> response.
>
> Signed-off-by: Dipayaan Roy <dipayanroy@linux•microsoft.com>
> ---
> Changes in v2:
> - Added common helper, proper clearing of gc flags.
Thanks for the update.
Reviewed-by: Simon Horman <horms@kernel•org>
...
next prev parent reply other threads:[~2026-03-02 11:27 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 8:15 [PATCH net-next, v2] net: mana: Trigger VF reset/recovery on health check failure due to HWC timeout Dipayaan Roy
2026-02-27 19:24 ` Haiyang Zhang
2026-03-02 11:27 ` Simon Horman [this message]
2026-03-03 10:30 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaV0HvxQneKM8p-c@horms.kernel.org \
--to=horms@kernel$(echo .)org \
--cc=andrew+netdev@lunn$(echo .)ch \
--cc=davem@davemloft$(echo .)net \
--cc=decui@microsoft$(echo .)com \
--cc=dipayanroy@linux$(echo .)microsoft.com \
--cc=dipayanroy@microsoft$(echo .)com \
--cc=edumazet@google$(echo .)com \
--cc=ernis@linux$(echo .)microsoft.com \
--cc=haiyangz@microsoft$(echo .)com \
--cc=kotaranov@microsoft$(echo .)com \
--cc=kuba@kernel$(echo .)org \
--cc=kys@microsoft$(echo .)com \
--cc=leon@kernel$(echo .)org \
--cc=linux-hyperv@vger$(echo .)kernel.org \
--cc=linux-kernel@vger$(echo .)kernel.org \
--cc=linux-rdma@vger$(echo .)kernel.org \
--cc=longli@microsoft$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=shirazsaleem@microsoft$(echo .)com \
--cc=shradhagupta@linux$(echo .)microsoft.com \
--cc=ssengar@linux$(echo .)microsoft.com \
--cc=wei.liu@kernel$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox