public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli•us>
To: John Fastabend <john.fastabend@gmail•com>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail•com>,
	"Thomas Graf" <tgraf@suug•ch>, "Jakub Kicinski" <kubakici@wp•pl>,
	netdev@vger•kernel.org, davem@davemloft•net, jhs@mojatatu•com,
	roopa@cumulusnetworks•com, simon.horman@netronome•com,
	ast@kernel•org, daniel@iogearbox•net, prem@barefootnetworks•com,
	hannes@stressinduktion•org, jbenc@redhat•com,
	tom@herbertland•com, mattyk@mellanox•com, idosch@mellanox•com,
	eladr@mellanox•com, yotamg@mellanox•com, nogahf@mellanox•com,
	ogerlitz@mellanox•com, linville@tuxdriver•com,
	andy@greyhouse•net, f.fainelli@gmail•com,
	dsa@cumulusnetworks•com, vivien.didelot@savoirfairelinux•com,
	andrew@lunn•ch, ivecera@redhat•com,
	"Maciej Żenczykowski" <zenczykowski@gmail•com>
Subject: Re: Let's do P4
Date: Wed, 2 Nov 2016 09:07:23 +0100	[thread overview]
Message-ID: <20161102080723.GD1713@nanopsycho.orion> (raw)
In-Reply-To: <5818B11C.2040004@gmail.com>

Tue, Nov 01, 2016 at 04:13:32PM CET, john.fastabend@gmail•com wrote:
>[...]
>
>>>> P4 is ment to program programable hw, not fixed pipeline.
>>>>
>>>
>>> I'm guessing there are no upstream drivers at the moment that support
>>> this though right? The rocker universe bits though could leverage this.
>> 
>> mlxsw. But this is naturaly not implemented yet, as there is no
>> infrastructure.
>
>Really? What is re-programmable?
>
>Can the parse graph support arbitrary parse graph?
>Can the table topology be reconfigured?
>Can new tables be created?
>What about "new" actions being defined at configuration time?
>
>Or is this just the normal TCAM configuration of defining key widths and
>fields.

At this point TCAM configuration.


>
>> 
>> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> since I cannot see how one can put the whole p4 language compiler
>>>>>>> into the driver, so this last step of p4ast->hw, I presume, will be
>>>>>>> done by firmware, which will be running full compiler in an embedded cpu
>>>>>>
>>>>>> In case of mlxsw, that compiler would be in driver.
>>>>>>
>>>>>>
>>>>>>> on the switch. To me that's precisely the kernel bypass, since we won't
>>>>>>> have a clue what HW capabilities actually are and won't be able to fine
>>>>>>> grain control them.
>>>>>>> Please correct me if I'm wrong.
>>>>>>
>>>>>> You are wrong. By your definition, everything has to be figured out in
>>>>>> driver and FW does nothing. Otherwise it could do "something else" and
>>>>>> that would be a bypass? Does not make any sense to me whatsoever.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Plus the thing I cannot imagine in the model you propose is table fillup.
>>>>>>>> For ebpf, you use maps. For p4 you would have to have a separate HW-only
>>>>>>>> API. This is very similar to the original John's Flow-API. And therefore
>>>>>>>> a kernel bypass.
>>>>>>>
>>>>>>> I think John's flow api is a better way to expose mellanox switch capabilities.
>>>>>>
>>>>>> We are under impression that p4 suits us nicely. But it is not about
>>>>>> us, it is about finding the common way to do this.
>>>>>>
>>>>>
>>>>> I'll just poke at my FlowAPI question again. For fixed ASICS what is
>>>>> the Flow-API missing. We have a few proof points that show it is both
>>>>> sufficient and usable for the handful of use cases we care about.
>>>>
>>>> Yeah, it is most probably fine. Even for flex ASICs to some point. The
>>>> question is how it stands comparing to other alternatives, like p4
>>>>
>>>
>>> Just to be clear the Flow-API _was_ generated from the initial P4 spec.
>>> The header files and tools used with it were autogenerated ("compiled"
>>> in a loose sense) from the P4 program. The piece I never exposed
>>> was the set_* operations to reconfigure running systems. I'm not sure
>>> how valuable this is in practice though.
>>>
>>> Also there is a P4-16 spec that will be released shortly that is more
>>> flexible and also more complex.
>> 
>> Would it be able to easily extend the Flow-API to include the changes?
>> 
>
>P4-16 will allow externs, "functions" to execute in the control flow and
>possibly inside the parse graph. None of this was considered in the
>Flow-API. So none of this is supported.
>
>I still have the question are you trying to push the "programming" of
>the device via 'tc' or just the runtime configuration of tables? If it
>is just runtime Flow-API is sufficient IMO. If its programming the
>device using the complete P4-16 spec than no its not sufficient. But

Sure we need both.


>I don't believe vendors will expose the complete programmability of the
>device in the driver, this is going to look more like a fw update than
>a runtime change at least on the devices I'm aware of.

Depends on driver. I think it is fine if driver processed it into come
hw configuration sequence or it simply pushed the program down to fw.
Both usecases are legit.


>
>> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> I also think it's not fair to call it 'bypass'. I see nothing in it
>>>>>>> that justify such 'swear word' ;)
>>>>>>
>>>>>> John's Flow-API was a kernel bypass. Why? It was a API specifically
>>>>>> designed to directly work with HW tables, without kernel being involved.
>>>>>
>>>>> I don't think that is a fair definition of HW bypass. The SKIP_SW flag
>>>>> does exactly that for 'tc' based offloads and it was not rejected.
>>>>
>>>> No, no, no. You still have possibility to do the same thing in kernel,
>>>> same functionality, with the same API. That is a big difference.
>>>>
>>>>
>>>>>
>>>>> The _real_ reason that seems to have fallen out of this and other
>>>>> discussion is the Flow-API didn't provide an in-kernel translation into
>>>>> an emulated patch. Note we always had a usermode translation to eBPF.
>>>>> A secondary reason appears to be overhead of adding yet another netlink
>>>>> family.
>>>>
>>>> Yeah. Maybe you remember, back then when Flow-API was being discussed,
>>>> I suggested to wrap it under TC as cls_xflows and cls_xflowsaction of
>>>> some sort and do in-kernel datapath implementation. I believe that after
>>>> that, it would be acceptable.
>>>>
>>>
>>> As I understand the thread here that is exactly the proposal here right?
>>> With a discussion around if the structures/etc are sufficient or any
>>> alternative representations exist.
>> 
>> Might be the way, yes. But I fear that with other p4 extensions this
>> might not be easy to align with. Therefore I though about something more
>> generic, like the p4ast.
>> 
>
>Same question as above are we _really_ talking about pushing the entire
>programmability of the device via 'tc'. If so we need to have a vendor
>say they will support and implement this?

We need some API, and I believe that TC is perfectly suitable for that.
Why do you think it's a problem?



>
>> 
>>>
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> The goal of flow api was to expose HW features to user space, so that
>>>>>>> user space can program it. For something simple as mellanox switch
>>>>>>> asic it fits perfectly well.
>>>>>>
>>>>>> Again, this is not mlx-asic-specific. And again, that is a kernel bypass.
>>>>>>
>>>>>>
>>>>>>> Unless I misunderstand the bigger goal of this discussion and it's
>>>>>>> about programming ezchip devices.
>>>>>>
>>>>>> No. For network processors, I believe that BPF is nicely offloadable, no
>>>>>> need to do the excercise for that.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> If the goal is to model hw tcam in the linux kernel then just introduce
>>>>>>> tcam bpf map type. It will be dog slow in user space, but it will
>>>>>>> match exactly what is happnening in the HW and user space can make
>>>>>>> sensible trade-offs.
>>>>>>
>>>>>> No, you got me completely wrong. This is not about the TCAM. This is
>>>>>> about differences in the 2 words (p4/bpf).
>>>>>> Again, for "p4-ish" devices, you have to translate BPF. And as you
>>>>>> noted, it's an instruction set. Very hard if not impossible to parse in
>>>>>> order to get back the original semantics.
>>>>>>
>>>>>
>>>>> I think in this discussion "p4-ish" devices means devices with multiple
>>>>> tables in a pipeline? Not devices that have programmable/configurable
>>>>> pipelines right? And if we get to talking about reconfigurable devices
>>>>> I believe this should be done out of band as it typically means
>>>>> reloading some ucode, etc.
>>>>
>>>> I'm talking about both. But I think we should focus on reconfigurable
>>>> ones, as we probably won't see that much fixed ones in the future.
>>>>
>>>
>>> hmm maybe but the 10/40/100Gbps devices are going to be around for some
>>> time. So we need to ensure these work well.
>> 
>> Yes, but I would like to emphasize, if we are defining new api
>> the primary focus should be on new devices.
>> 
>> 
>
>What device though. Back to mlxsw question about actually supporting
>this stuff.
>

  reply	other threads:[~2016-11-02  8:07 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-29  7:53 Let's do P4 Jiri Pirko
2016-10-29  9:39 ` Thomas Graf
2016-10-29 10:10   ` Jiri Pirko
2016-10-29 11:15     ` Thomas Graf
2016-10-29 11:28       ` Jiri Pirko
2016-10-29 12:09         ` Thomas Graf
2016-10-29 13:58           ` Jiri Pirko
2016-10-29 14:54             ` Jakub Kicinski
2016-10-29 14:58               ` Jiri Pirko
2016-10-29 14:49 ` Jakub Kicinski
2016-10-29 14:55   ` Jiri Pirko
2016-10-29 16:46   ` John Fastabend
2016-10-30  7:44     ` Jiri Pirko
2016-10-30 10:26       ` Thomas Graf
2016-10-30 16:38         ` Jiri Pirko
2016-10-30 17:45           ` Jakub Kicinski
2016-10-30 18:01             ` Jiri Pirko
2016-10-30 18:44               ` Jakub Kicinski
2016-10-30 19:56                 ` Jiri Pirko
2016-10-30 21:14                   ` John Fastabend
2016-10-30 22:39           ` Alexei Starovoitov
2016-10-31  6:03             ` Maciej Żenczykowski
2016-10-31  7:47               ` Jiri Pirko
2016-10-31  9:39             ` Jiri Pirko
2016-10-31 16:53               ` John Fastabend
2016-10-31 17:12                 ` Jiri Pirko
2016-10-31 18:32                   ` Hannes Frederic Sowa
2016-10-31 19:35                   ` John Fastabend
2016-11-01  8:46                     ` Jiri Pirko
2016-11-01 15:13                       ` John Fastabend
2016-11-02  8:07                         ` Jiri Pirko [this message]
2016-11-02 15:18                           ` John Fastabend
2016-11-02 15:23                             ` Jiri Pirko
2016-11-02  2:29               ` Daniel Borkmann
2016-11-02  5:06                 ` Maciej Żenczykowski
2016-11-02  8:14                 ` Jiri Pirko
2016-11-02 15:22                   ` John Fastabend
2016-11-02 15:27                     ` Jiri Pirko
2016-10-30 20:54       ` John Fastabend
2016-11-01 11:57 ` Jamal Hadi Salim
2016-11-01 15:03   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161102080723.GD1713@nanopsycho.orion \
    --to=jiri@resnulli$(echo .)us \
    --cc=alexei.starovoitov@gmail$(echo .)com \
    --cc=andrew@lunn$(echo .)ch \
    --cc=andy@greyhouse$(echo .)net \
    --cc=ast@kernel$(echo .)org \
    --cc=daniel@iogearbox$(echo .)net \
    --cc=davem@davemloft$(echo .)net \
    --cc=dsa@cumulusnetworks$(echo .)com \
    --cc=eladr@mellanox$(echo .)com \
    --cc=f.fainelli@gmail$(echo .)com \
    --cc=hannes@stressinduktion$(echo .)org \
    --cc=idosch@mellanox$(echo .)com \
    --cc=ivecera@redhat$(echo .)com \
    --cc=jbenc@redhat$(echo .)com \
    --cc=jhs@mojatatu$(echo .)com \
    --cc=john.fastabend@gmail$(echo .)com \
    --cc=kubakici@wp$(echo .)pl \
    --cc=linville@tuxdriver$(echo .)com \
    --cc=mattyk@mellanox$(echo .)com \
    --cc=netdev@vger$(echo .)kernel.org \
    --cc=nogahf@mellanox$(echo .)com \
    --cc=ogerlitz@mellanox$(echo .)com \
    --cc=prem@barefootnetworks$(echo .)com \
    --cc=roopa@cumulusnetworks$(echo .)com \
    --cc=simon.horman@netronome$(echo .)com \
    --cc=tgraf@suug$(echo .)ch \
    --cc=tom@herbertland$(echo .)com \
    --cc=vivien.didelot@savoirfairelinux$(echo .)com \
    --cc=yotamg@mellanox$(echo .)com \
    --cc=zenczykowski@gmail$(echo .)com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox