From: Jason Gunthorpe <jgg@nvidia•com>
To: Leon Romanovsky <leon@kernel•org>
Cc: Patrisious Haddad <phaddad@nvidia•com>,
"David S. Miller" <davem@davemloft•net>,
Eric Dumazet <edumazet@google•com>,
Jakub Kicinski <kuba@kernel•org>,
linux-rdma@vger•kernel.org, netdev@vger•kernel.org,
Paolo Abeni <pabeni@redhat•com>,
Saeed Mahameed <saeedm@nvidia•com>
Subject: Re: [PATCH rdma-next v1 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure
Date: Tue, 21 Mar 2023 08:53:35 -0300 [thread overview]
Message-ID: <ZBmav4CF1yqRvyzZ@nvidia.com> (raw)
In-Reply-To: <20230321075458.GP36557@unreal>
On Tue, Mar 21, 2023 at 09:54:58AM +0200, Leon Romanovsky wrote:
> On Mon, Mar 20, 2023 at 04:18:14PM -0300, Jason Gunthorpe wrote:
> > On Thu, Mar 16, 2023 at 03:39:27PM +0200, Leon Romanovsky wrote:
> > > From: Patrisious Haddad <phaddad@nvidia•com>
> > >
> > > Previously when destroying a DCT, if the firmware function for the
> > > destruction failed, the common resource would have been destroyed
> > > either way, since it was destroyed before the firmware object.
> > > Which leads to kernel warning "refcount_t: underflow" which indicates
> > > possible use-after-free.
> > > Which is triggered when we try to destroy the common resource for the
> > > second time and execute refcount_dec_and_test(&common->refcount).
> > >
> > > So, currently before destroying the common resource we check its
> > > refcount and continue with the destruction only if it isn't zero.
> >
> > This seems super sketchy
> >
> > If the destruction fails why not set the refcount back to 1?
>
> Because destruction will fail in destroy_rq_tracked() which is after
> destroy_resource_common().
>
> In first destruction attempt, we delete qp from radix tree and wait for all
> reference to drop. In order do not undo all this logic (setting 1 alone is
> not enough), it is much safer simply skip destroy_resource_common() in reentry
> case.
This is the bug I pointed a long time ago, it is ordered wrong to
remove restrack before destruction is assured
Jason
next prev parent reply other threads:[~2023-03-21 11:54 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-16 13:39 [PATCH rdma-next v1 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2023-03-16 13:39 ` [PATCH mlx5-next v1 1/3] net/mlx5: Nullify qp->dbg pointer post destruction Leon Romanovsky
2023-03-16 13:39 ` [PATCH rdma-next v1 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure Leon Romanovsky
2023-03-20 19:18 ` Jason Gunthorpe
2023-03-21 7:54 ` Leon Romanovsky
2023-03-21 11:53 ` Jason Gunthorpe [this message]
2023-03-21 12:02 ` Leon Romanovsky
2023-03-21 12:37 ` Jason Gunthorpe
2023-03-21 12:43 ` Leon Romanovsky
2023-03-16 13:39 ` [PATCH rdma-next v1 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZBmav4CF1yqRvyzZ@nvidia.com \
--to=jgg@nvidia$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=edumazet@google$(echo .)com \
--cc=kuba@kernel$(echo .)org \
--cc=leon@kernel$(echo .)org \
--cc=linux-rdma@vger$(echo .)kernel.org \
--cc=netdev@vger$(echo .)kernel.org \
--cc=pabeni@redhat$(echo .)com \
--cc=phaddad@nvidia$(echo .)com \
--cc=saeedm@nvidia$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox