From: Vadim Fedorenko <vadim.fedorenko@linux•dev>
To: Saeed Mahameed <saeed@kernel•org>
Cc: Vadim Fedorenko <vadfed@meta•com>,
Jakub Kicinski <kuba@kernel•org>,
Rahul Rameshbabu <rrameshbabu@nvidia•com>,
Tariq Toukan <ttoukan.linux@gmail•com>,
Gal Pressman <gal@nvidia•com>,
netdev@vger•kernel.org
Subject: Re: [PATCH net v4 2/2] mlx5: fix possible ptp queue fifo use-after-free
Date: Thu, 2 Feb 2023 01:34:32 +0000 [thread overview]
Message-ID: <39126b10-c22f-5393-b7bd-d699b8b8eed5@linux.dev> (raw)
In-Reply-To: <Y9r4UHZUUAdYdTPp@x130>
On 01/02/2023 23:40, Saeed Mahameed wrote:
> On 01 Feb 21:36, Vadim Fedorenko wrote:
>> On 01/02/2023 18:19, Saeed Mahameed wrote:
>>> On 01 Feb 04:26, Vadim Fedorenko wrote:
>>>> Fifo indexes were not checked during pop operations and it leads to
>>>> potential use-after-free when poping from empty queue. Such case was
>>>> possible during re-sync action.
>>>>
>>>> There were out-of-order cqe spotted which lead to drain of the queue
>>>> and
>>>> use-after-free because of lack of fifo pointers check. Special check
>>>> is added to avoid resync operation if SKB could not exist in the fifo
>>>> because of OOO cqe (skb_id must be between consumer and producer
>>>> index).
>>>>
>>>> Fixes: 58a518948f60 ("net/mlx5e: Add resiliency for PTP TX port
>>>> timestamp")
>>>> Signed-off-by: Vadim Fedorenko <vadfed@meta•com>
>>>> ---
>>>> .../net/ethernet/mellanox/mlx5/core/en/ptp.c | 23 ++++++++++++++-----
>>>> .../net/ethernet/mellanox/mlx5/core/en/txrx.h | 4 +++-
>>>> 2 files changed, 20 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> index b72de2b520ec..5df726185192 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c
>>>> @@ -86,7 +86,7 @@ static bool mlx5e_ptp_ts_cqe_drop(struct
>>>> mlx5e_ptpsq *ptpsq, u16 skb_cc, u16 skb
>>>> return (ptpsq->ts_cqe_ctr_mask && (skb_cc != skb_id));
>>>> }
>>>>
>>>> -static void mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq
>>>> *ptpsq, u16 skb_cc,
>>>> +static bool mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq
>>>> *ptpsq, u16 skb_cc,
>>>> u16 skb_id, int budget)
>>>> {
>>>> struct skb_shared_hwtstamps hwts = {};
>>>> @@ -94,14 +94,23 @@ static void
>>>> mlx5e_ptp_skb_fifo_ts_cqe_resync(struct mlx5e_ptpsq *ptpsq, u16 skb_
>>>>
>>>> ptpsq->cq_stats->resync_event++;
>>>>
>>>> - while (skb_cc != skb_id) {
>>>> - skb = mlx5e_skb_fifo_pop(&ptpsq->skb_fifo);
>>>> + if (skb_cc > skb_id || PTP_WQE_CTR2IDX(ptpsq->skb_fifo_pc) <
>>>> skb_id) {
>>>
>>> To avoid returning boolean and add more functionality to this function,
>>> I prefer to put this check in mlx5e_ptp_handle_ts_cqe(), see below.
>>>
>>
>> Let's discuss this point, see comments below.
>>
>>>> + mlx5_core_err_rl(ptpsq->txqsq.mdev, "out-of-order ptp cqe\n");
>>>
>>> it's better to add a counter for this, eg:
>>> ptpsq->cq_stats->ooo_cqe_drop++;
>>
>> Sure, will do.
>>
>>>
>>>> + return false;
>>>> + }
>>>> +
>>>> + while (skb_cc != skb_id && (skb =
>>>> mlx5e_skb_fifo_pop(&ptpsq->skb_fifo))) {
>>>> hwts.hwtstamp = mlx5e_skb_cb_get_hwts(skb)->cqe_hwtstamp;
>>>> skb_tstamp_tx(skb, &hwts);
>>>> ptpsq->cq_stats->resync_cqe++;
>>>> napi_consume_skb(skb, budget);
>>>> skb_cc = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
>>>> }
>>>> +
>>>> + if (!skb)
>>>> + return false;
>>>> +
>>>> + return true;
>>>> }
>>>>
>>>> static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq,
>>>> @@ -111,7 +120,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct
>>>> mlx5e_ptpsq *ptpsq,
>>>> u16 skb_id = PTP_WQE_CTR2IDX(be16_to_cpu(cqe->wqe_counter));
>>>> u16 skb_cc = PTP_WQE_CTR2IDX(ptpsq->skb_fifo_cc);
>>>> struct mlx5e_txqsq *sq = &ptpsq->txqsq;
>>>> - struct sk_buff *skb;
>>>> + struct sk_buff *skb = NULL;
>>>> ktime_t hwtstamp;
>>>>
>>>> if (unlikely(MLX5E_RX_ERR_CQE(cqe))) {
>>>> @@ -120,8 +129,10 @@ static void mlx5e_ptp_handle_ts_cqe(struct
>>>> mlx5e_ptpsq *ptpsq,
>>>> goto out;
>>>> }
>>>>
>>>> - if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id))
>>>> - mlx5e_ptp_skb_fifo_ts_cqe_resync(ptpsq, skb_cc, skb_id,
>>>> budget);
>>>
>>> you can check here:
>>> /* ignore ooo cqe as it was already handled by a previous resync */
>>> if (ooo_cqe(cqe))
>>> return;
>>
>> I assume that mlx5e_ptp_skb_fifo_ts_cqe_resync() is error-recovery
>> path and should not happen frequently. If we move this check to
>> mlx5e_ptp_handle_ts_cqe() we will have additional if with 2 checks for
>> every cqe coming from ptp queue - the fast path. With our load of
>
> we could do:
>
> if (mlx5e_ptp_ts_cqe_drop(ptpsq, skb_cc, skb_id))
> {
> if (ooo_cqe) /* already handled by a previous resync */
> return;
> mlx5e_ptp_skb_fifo_ts_cqe_resync(..);
> }
>
Yep, that's one of the options I suggested. Ok, let's do it this way.
>> 1+Mpps it could be countable. Another point is that
>> mlx5e_ptp_skb_fifo_ts_cqe_resync() will assume that skb_id must always
>> be within fifo indexes and any other (future?) code has to implement
>> this check. Otherwise it will cause use-after-free, double-free and
>> then crash, especially if we remove check in mlx5e_skb_fifo_pop() that
>> you commented below. I think it's ok to have additional check in error
>> path to prevent anything like that in the future.
>>
>> If you stongly against converting mlx5e_ptp_skb_fifo_ts_cqe_resync()
>> to return bool, I can add the check to 'if mlx5e_ptp_ts_cqe_drop()'
>> scope before calling resync(), but it will not remove problems from my
>> second point and I just would like to see explicit 'yes, we agree to
>> have dangerous code in our driver' from you or other maintainers in
>
> what do you mean ? define dangerous..
> we don't willingly push dangerous code :) the code is built and designed
> for the current HW spec under the assumption that HW/FW is bug free,
> otherwise what's the point of the spec if we are going to write doubtful,
> inefficient, paranoid driver code..
If FW/HW is bug free - yes, I agree. But the real world is not that
perfect. And I believe that's the reason why kfifo implementation has
all these checks in place.
> I understand your concern but we don't design data path code to be future
> proof.
>
> This patch is just a temporary fix for another underlying issue that
> we need to continue debug. so let's keep it minimal until we find the
> real issue.
>
Yeah, with minimal changes we will definitely revisit this code once we
find a root cause of the issue.
> keep in mind that the whole code is designed with fifos and only in-order
> queues guaranteed by both HW and Firmware, so there's no reason to be
> too-paranoid .. just fix the known bugs :).
>
TBH, I simply don't want to spend more days debugging unclear kernel
crashes if/once we hit another FW/HW bug. It easier to debug issue when
kernel continues to run. But anyway, let's re-think it once we have root
cause of the issue.
I'll publish v5 next day, thanks for the review!
>>> static inline
>>>> struct sk_buff *mlx5e_skb_fifo_pop(struct mlx5e_skb_fifo *fifo)
>>>> {
>>>> + if (*fifo->pc == *fifo->cc)
>>>> + return NULL;
>>>> +
>>>
>>> I think this won't be necessary if you check for ooo early on
>>> mlx5e_ptp_handle_ts_cqe() like i suggested above.
>>>
>> And again the only concern here is that we don't have any checks
>> whether FIFO has anything to pop before actually poping the value.
>> That can easily bring use-after-free in the futuee, especially because
>> the function is not ptp specific and is already used for other fifos.
>> I can add unlikely() for this check if it helps, but I would like to
>> have this check in the code.
>>
>
> you shouldn't access the fifo if by design it's guaranteed nothing is
> there.
> We don't build for a future/fool proof code, the fifo is only accessed
> when we know there's something there by design, this is not a general
> purpose fifo, it's a fifo used by mlx5 ordered cqs..
Got it.
> According to your logic, kfree should also check for double free.. ? :)
Such kfree will have unacceptable performance regressions, but I believe
we have something like this in debug kernels :)
next prev parent reply other threads:[~2023-02-02 1:40 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-01 12:26 [PATCH net v4 0/2] mlx5: ptp fifo bugfixes Vadim Fedorenko
2023-02-01 12:26 ` [PATCH net v4 1/2] mlx5: fix skb leak while fifo resync and push Vadim Fedorenko
2023-02-01 12:26 ` [PATCH net v4 2/2] mlx5: fix possible ptp queue fifo use-after-free Vadim Fedorenko
2023-02-01 18:19 ` Saeed Mahameed
2023-02-01 21:36 ` Vadim Fedorenko
2023-02-01 23:40 ` Saeed Mahameed
2023-02-02 1:34 ` Vadim Fedorenko [this message]
2023-02-02 3:01 ` Jakub Kicinski
2023-02-02 3:08 ` Jakub Kicinski
2023-02-02 11:48 ` Vadim Fedorenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=39126b10-c22f-5393-b7bd-d699b8b8eed5@linux.dev \
--to=vadim.fedorenko@linux$(echo .)dev \
--cc=gal@nvidia$(echo .)com \
--cc=kuba@kernel$(echo .)org \
--cc=netdev@vger$(echo .)kernel.org \
--cc=rrameshbabu@nvidia$(echo .)com \
--cc=saeed@kernel$(echo .)org \
--cc=ttoukan.linux@gmail$(echo .)com \
--cc=vadfed@meta$(echo .)com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox