[bpf-next PATCH 0/3] bpf/xdp: fix generic-XDP and demonstrate VLAN manipulation

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

* [bpf-next PATCH 0/3] bpf/xdp: fix generic-XDP and demonstrate VLAN manipulation
@ 2018-09-25 14:25 Jesper Dangaard Brouer
  2018-09-25 14:25 ` [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2018-09-25 14:25 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer; +Cc: Daniel Borkmann, yoel, Alexei Starovoitov

While implementing PoC building blocks for eBPF code XDP+TC that can
manipulate VLANs headers, I discovered a bug in generic-XDP.

The fix should be backported to stable kernels.  Even-though
generic-XDP were introduced in v4.12, I think the bug is not exposed
until v4.14 in the mentined fixes commit.

---
p.s. send from Embedded Recipes 2018 in Paris

Jesper Dangaard Brouer (3):
      net: fix generic XDP to handle if eth header was mangled
      bpf: make TC vlan bpf_helpers avail to selftests
      selftests/bpf: add XDP selftests for modifying and popping VLAN headers


 net/core/dev.c                               |   14 +
 tools/testing/selftests/bpf/Makefile         |    5 
 tools/testing/selftests/bpf/bpf_helpers.h    |    4 
 tools/testing/selftests/bpf/test_xdp_vlan.c  |  292 ++++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_xdp_vlan.sh |  195 +++++++++++++++++
 5 files changed, 508 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_xdp_vlan.c
 create mode 100755 tools/testing/selftests/bpf/test_xdp_vlan.sh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled
  2018-09-25 14:25 [bpf-next PATCH 0/3] bpf/xdp: fix generic-XDP and demonstrate VLAN manipulation Jesper Dangaard Brouer
@ 2018-09-25 14:25 ` Jesper Dangaard Brouer
  2018-09-26  5:36   ` Song Liu
  2018-09-25 14:25 ` [bpf-next PATCH 2/3] bpf: make TC vlan bpf_helpers avail to selftests Jesper Dangaard Brouer
  2018-09-25 14:25 ` [bpf-next PATCH 3/3] selftests/bpf: add XDP selftests for modifying and popping VLAN headers Jesper Dangaard Brouer
  2 siblings, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2018-09-25 14:25 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer; +Cc: Daniel Borkmann, yoel, Alexei Starovoitov

XDP can modify (and resize) the Ethernet header in the packet.

There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
are setup before reaching (netif_receive_)generic_xdp.

This bug was hit when XDP were popping VLAN headers (changing
eth->h_proto), as skb->protocol still contains VLAN-indication
(ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
the packet (basically popping the VLAN again).

This patch catch if XDP changed eth header in such a way, that SKB
fields needs to be updated.

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
---
 net/core/dev.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index ca78dc5a79a3..db6d89f536cb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4258,6 +4258,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
 	u32 metalen, act = XDP_DROP;
+	__be16 orig_eth_type;
+	struct ethhdr *eth;
+	bool orig_bcast;
 	int hlen, off;
 	u32 mac_len;
 
@@ -4298,6 +4301,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp->data_hard_start = skb->data - skb_headroom(skb);
 	orig_data_end = xdp->data_end;
 	orig_data = xdp->data;
+	eth = (struct ethhdr *)xdp->data;
+	orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
+	orig_eth_type = eth->h_proto;
 
 	rxqueue = netif_get_rxqueue(skb);
 	xdp->rxq = &rxqueue->xdp_rxq;
@@ -4321,6 +4327,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 
 	}
 
+	/* check if XDP changed eth hdr such SKB needs update */
+	eth = (struct ethhdr *)xdp->data;
+	if ((orig_eth_type != eth->h_proto) ||
+	    (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) {
+		__skb_push(skb, mac_len);
+		skb->protocol = eth_type_trans(skb, skb->dev);
+	}
+
 	switch (act) {
 	case XDP_REDIRECT:
 	case XDP_TX:

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [bpf-next PATCH 2/3] bpf: make TC vlan bpf_helpers avail to selftests
  2018-09-25 14:25 [bpf-next PATCH 0/3] bpf/xdp: fix generic-XDP and demonstrate VLAN manipulation Jesper Dangaard Brouer
  2018-09-25 14:25 ` [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled Jesper Dangaard Brouer
@ 2018-09-25 14:25 ` Jesper Dangaard Brouer
  2018-09-25 14:25 ` [bpf-next PATCH 3/3] selftests/bpf: add XDP selftests for modifying and popping VLAN headers Jesper Dangaard Brouer
  2 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2018-09-25 14:25 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer; +Cc: Daniel Borkmann, yoel, Alexei Starovoitov

The helper bpf_skb_vlan_push is needed by next patch, and the helper
bpf_skb_vlan_pop is added for completeness, regarding VLAN helpers.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
---
 tools/testing/selftests/bpf/bpf_helpers.h |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index e4be7730222d..d057e6891a6b 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -143,6 +143,10 @@ static unsigned long long (*bpf_skb_cgroup_id)(void *ctx) =
 	(void *) BPF_FUNC_skb_cgroup_id;
 static unsigned long long (*bpf_skb_ancestor_cgroup_id)(void *ctx, int level) =
 	(void *) BPF_FUNC_skb_ancestor_cgroup_id;
+static int (*bpf_skb_vlan_push)(void *ctx, __be16 vlan_proto, __u16 vlan_tci) =
+	(void *) BPF_FUNC_skb_vlan_push;
+static int (*bpf_skb_vlan_pop)(void *ctx) =
+	(void *) BPF_FUNC_skb_vlan_pop;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [bpf-next PATCH 3/3] selftests/bpf: add XDP selftests for modifying and popping VLAN headers
  2018-09-25 14:25 [bpf-next PATCH 0/3] bpf/xdp: fix generic-XDP and demonstrate VLAN manipulation Jesper Dangaard Brouer
  2018-09-25 14:25 ` [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled Jesper Dangaard Brouer
  2018-09-25 14:25 ` [bpf-next PATCH 2/3] bpf: make TC vlan bpf_helpers avail to selftests Jesper Dangaard Brouer
@ 2018-09-25 14:25 ` Jesper Dangaard Brouer
  2 siblings, 0 replies; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2018-09-25 14:25 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer; +Cc: Daniel Borkmann, yoel, Alexei Starovoitov

This XDP selftest also contain a small TC-bpf component. It provoke
the generic-XDP bug fixed in previous commit.

The selftest itself shows how to do VLAN manipulation from XDP and TC.
The test demonstrate how XDP ingress can remove a VLAN tag, and how TC
egress can add back a VLAN tag.

This use-case originates from a production need by ISP (kviknet.dk),
who gets DSL-lines terminated as VLAN Q-in-Q tagged packets, and want
to avoid having an net_device for every end-customer on the box doing
the L2 to L3 termination.
  The test-setup is done via a veth-pair and creating two network
namespaces (ns1 and ns2).  The 'ns1' simulate the ISP network that are
loading the BPF-progs stripping and adding VLAN IDs.  The 'ns2'
simulate the DSL-customer that are using VLAN tagged packets.

Running the script with --interactive, will simply not call the
cleanup function.  This gives the effect of creating a testlab, that
the users can inspect and play with.  The --verbose option will simply
request that the shell will print input lines as they are read, this
include comments, which in effect make the comments visible docs.

Reported-by: Yoel Caspersen <yoel@kviknet•dk>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
---
 tools/testing/selftests/bpf/Makefile         |    5 
 tools/testing/selftests/bpf/test_xdp_vlan.c  |  292 ++++++++++++++++++++++++++
 tools/testing/selftests/bpf/test_xdp_vlan.sh |  195 +++++++++++++++++
 3 files changed, 490 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_xdp_vlan.c
 create mode 100755 tools/testing/selftests/bpf/test_xdp_vlan.sh

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index fd3851d5c079..b2c80a73b148 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
 	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
 	get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
-	test_skb_cgroup_id_kern.o bpf_flow.o
+	test_skb_cgroup_id_kern.o bpf_flow.o test_xdp_vlan.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -48,7 +48,8 @@ TEST_PROGS := test_kmod.sh \
 	test_lwt_seg6local.sh \
 	test_lirc_mode2.sh \
 	test_skb_cgroup_id.sh \
-	test_flow_dissector.sh
+	test_flow_dissector.sh \
+	test_xdp_vlan.sh
 
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
diff --git a/tools/testing/selftests/bpf/test_xdp_vlan.c b/tools/testing/selftests/bpf/test_xdp_vlan.c
new file mode 100644
index 000000000000..365a7d2d9f5c
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_xdp_vlan.c
@@ -0,0 +1,292 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *  Copyright(c) 2018 Jesper Dangaard Brouer.
+ *
+ * XDP/TC VLAN manipulation example
+ *
+ * GOTCHA: Remember to disable NIC hardware offloading of VLANs,
+ * else the VLAN tags are NOT inlined in the packet payload:
+ *
+ *  # ethtool -K ixgbe2 rxvlan off
+ *
+ * Verify setting:
+ *  # ethtool -k ixgbe2 | grep rx-vlan-offload
+ *  rx-vlan-offload: off
+ *
+ */
+#include <stddef.h>
+#include <stdbool.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <linux/if_vlan.h>
+#include <linux/in.h>
+#include <linux/pkt_cls.h>
+
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+/* linux/if_vlan.h have not exposed this as UAPI, thus mirror some here
+ *
+ *	struct vlan_hdr - vlan header
+ *	@h_vlan_TCI: priority and VLAN ID
+ *	@h_vlan_encapsulated_proto: packet type ID or len
+ */
+struct _vlan_hdr {
+	__be16 h_vlan_TCI;
+	__be16 h_vlan_encapsulated_proto;
+};
+#define VLAN_PRIO_MASK		0xe000 /* Priority Code Point */
+#define VLAN_PRIO_SHIFT		13
+#define VLAN_CFI_MASK		0x1000 /* Canonical Format Indicator */
+#define VLAN_TAG_PRESENT	VLAN_CFI_MASK
+#define VLAN_VID_MASK		0x0fff /* VLAN Identifier */
+#define VLAN_N_VID		4096
+
+struct parse_pkt {
+	__u16 l3_proto;
+	__u16 l3_offset;
+	__u16 vlan_outer;
+	__u16 vlan_inner;
+	__u8  vlan_outer_offset;
+	__u8  vlan_inner_offset;
+};
+
+char _license[] SEC("license") = "GPL";
+
+static __always_inline
+bool parse_eth_frame(struct ethhdr *eth, void *data_end, struct parse_pkt *pkt)
+{
+	__u16 eth_type;
+	__u8 offset;
+
+	offset = sizeof(*eth);
+	/* Make sure packet is large enough for parsing eth + 2 VLAN headers */
+	if ((void *)eth + offset + (2*sizeof(struct _vlan_hdr)) > data_end)
+		return false;
+
+	eth_type = eth->h_proto;
+
+	/* Handle outer VLAN tag */
+	if (eth_type == bpf_htons(ETH_P_8021Q)
+	    || eth_type == bpf_htons(ETH_P_8021AD)) {
+		struct _vlan_hdr *vlan_hdr;
+
+		vlan_hdr = (void *)eth + offset;
+		pkt->vlan_outer_offset = offset;
+		pkt->vlan_outer = bpf_ntohs(vlan_hdr->h_vlan_TCI)
+				& VLAN_VID_MASK;
+		eth_type        = vlan_hdr->h_vlan_encapsulated_proto;
+		offset += sizeof(*vlan_hdr);
+	}
+
+	/* Handle inner (double) VLAN tag */
+	if (eth_type == bpf_htons(ETH_P_8021Q)
+	    || eth_type == bpf_htons(ETH_P_8021AD)) {
+		struct _vlan_hdr *vlan_hdr;
+
+		vlan_hdr = (void *)eth + offset;
+		pkt->vlan_inner_offset = offset;
+		pkt->vlan_inner = bpf_ntohs(vlan_hdr->h_vlan_TCI)
+				& VLAN_VID_MASK;
+		eth_type        = vlan_hdr->h_vlan_encapsulated_proto;
+		offset += sizeof(*vlan_hdr);
+	}
+
+	pkt->l3_proto = bpf_ntohs(eth_type); /* Convert to host-byte-order */
+	pkt->l3_offset = offset;
+
+	return true;
+}
+
+/* Hint, VLANs are choosen to hit network-byte-order issues */
+#define TESTVLAN 4011 /* 0xFAB */
+// #define TO_VLAN  4000 /* 0xFA0 (hint 0xOA0 = 160) */
+
+SEC("xdp_drop_vlan_4011")
+int  xdp_prognum0(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct parse_pkt pkt = { 0 };
+
+	if (!parse_eth_frame(data, data_end, &pkt))
+		return XDP_ABORTED;
+
+	/* Drop specific VLAN ID example */
+	if (pkt.vlan_outer == TESTVLAN)
+		return XDP_ABORTED;
+	/*
+	 * Using XDP_ABORTED makes it possible to record this event,
+	 * via tracepoint xdp:xdp_exception like:
+	 *  # perf record -a -e xdp:xdp_exception
+	 *  # perf script
+	 */
+	return XDP_PASS;
+}
+/*
+Commands to setup VLAN on Linux to test packets gets dropped:
+
+ export ROOTDEV=ixgbe2
+ export VLANID=4011
+ ip link add link $ROOTDEV name $ROOTDEV.$VLANID type vlan id $VLANID
+ ip link set dev  $ROOTDEV.$VLANID up
+
+ ip link set dev $ROOTDEV mtu 1508
+ ip addr add 100.64.40.11/24 dev $ROOTDEV.$VLANID
+
+Load prog with ip tool:
+
+ ip link set $ROOTDEV xdp off
+ ip link set $ROOTDEV xdp object xdp_vlan01_kern.o section xdp_drop_vlan_4011
+
+*/
+
+/* Changing VLAN to zero, have same practical effect as removing the VLAN. */
+#define TO_VLAN	0
+
+SEC("xdp_vlan_change")
+int  xdp_prognum1(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct parse_pkt pkt = { 0 };
+
+	if (!parse_eth_frame(data, data_end, &pkt))
+		return XDP_ABORTED;
+
+	/* Change specific VLAN ID */
+	if (pkt.vlan_outer == TESTVLAN) {
+		struct _vlan_hdr *vlan_hdr = data + pkt.vlan_outer_offset;
+
+		/* Modifying VLAN, preserve top 4 bits */
+		vlan_hdr->h_vlan_TCI =
+			bpf_htons((bpf_ntohs(vlan_hdr->h_vlan_TCI) & 0xf000)
+				  | TO_VLAN);
+	}
+
+	return XDP_PASS;
+}
+
+/*
+ * Show XDP+TC can cooperate, on creating a VLAN rewriter.
+ * 1. Create a XDP prog that can "pop"/remove a VLAN header.
+ * 2. Create a TC-bpf prog that egress can add a VLAN header.
+ */
+
+#ifndef ETH_ALEN /* Ethernet MAC address length */
+#define ETH_ALEN	6	/* bytes */
+#endif
+#define VLAN_HDR_SZ	4	/* bytes */
+
+SEC("xdp_vlan_remove_outer")
+int  xdp_prognum2(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct parse_pkt pkt = { 0 };
+	char *dest;
+
+	if (!parse_eth_frame(data, data_end, &pkt))
+		return XDP_ABORTED;
+
+	/* Skip packet if no outer VLAN was detected */
+	if (pkt.vlan_outer_offset == 0)
+		return XDP_PASS;
+
+	/* Moving Ethernet header, dest overlap with src, memmove handle this */
+	dest = data;
+	dest+= VLAN_HDR_SZ;
+	/*
+	 * Notice: Taking over vlan_hdr->h_vlan_encapsulated_proto, by
+	 * only moving two MAC addrs (12 bytes), not overwriting last 2 bytes
+	 */
+	__builtin_memmove(dest, data, ETH_ALEN * 2);
+	/* Note: LLVM built-in memmove inlining require size to be constant */
+
+	/* Move start of packet header seen by Linux kernel stack */
+	bpf_xdp_adjust_head(ctx, VLAN_HDR_SZ);
+
+	return XDP_PASS;
+}
+
+static __always_inline
+void shift_mac_4bytes_16bit(void *data)
+{
+	__u16 *p = data;
+
+	p[7] = p[5]; /* delete p[7] was vlan_hdr->h_vlan_TCI */
+	p[6] = p[4]; /* delete p[6] was ethhdr->h_proto */
+	p[5] = p[3];
+	p[4] = p[2];
+	p[3] = p[1];
+	p[2] = p[0];
+}
+
+static __always_inline
+void shift_mac_4bytes_32bit(void *data)
+{
+	__u32 *p = data;
+
+	/* Assuming VLAN hdr present. The 4 bytes in p[3] that gets
+	 * overwritten, is ethhdr->h_proto and vlan_hdr->h_vlan_TCI.
+	 * The vlan_hdr->h_vlan_encapsulated_proto take over role as
+	 * ethhdr->h_proto.
+	 */
+	p[3] = p[2];
+	p[2] = p[1];
+	p[1] = p[0];
+}
+
+SEC("xdp_vlan_remove_outer2")
+int  xdp_prognum3(struct xdp_md *ctx)
+{
+	void *data_end = (void *)(long)ctx->data_end;
+	void *data     = (void *)(long)ctx->data;
+	struct ethhdr *orig_eth = data;
+	struct parse_pkt pkt = { 0 };
+
+	if (!parse_eth_frame(orig_eth, data_end, &pkt))
+		return XDP_ABORTED;
+
+	/* Skip packet if no outer VLAN was detected */
+	if (pkt.vlan_outer_offset == 0)
+		return XDP_PASS;
+
+	/* Simply shift down MAC addrs 4 bytes, overwrite h_proto + TCI */
+	shift_mac_4bytes_32bit(data);
+
+	/* Move start of packet header seen by Linux kernel stack */
+	bpf_xdp_adjust_head(ctx, VLAN_HDR_SZ);
+
+	return XDP_PASS;
+}
+
+/*=====================================
+ *  BELOW: TC-hook based ebpf programs
+ * ====================================
+ * The TC-clsact eBPF programs (currently) need to be attach via TC commands
+ */
+
+SEC("tc_vlan_push")
+int _tc_progA(struct __sk_buff *ctx)
+{
+	bpf_skb_vlan_push(ctx, bpf_htons(ETH_P_8021Q), TESTVLAN);
+
+	return TC_ACT_OK;
+}
+/*
+Commands to setup TC to use above bpf prog:
+
+export ROOTDEV=ixgbe2
+export FILE=xdp_vlan01_kern.o
+
+# Re-attach clsact to clear/flush existing role
+tc qdisc del dev $ROOTDEV clsact 2> /dev/null ;\
+tc qdisc add dev $ROOTDEV clsact
+
+# Attach BPF prog EGRESS
+tc filter add dev $ROOTDEV egress \
+  prio 1 handle 1 bpf da obj $FILE sec tc_vlan_push
+
+tc filter show dev $ROOTDEV egress
+*/
diff --git a/tools/testing/selftests/bpf/test_xdp_vlan.sh b/tools/testing/selftests/bpf/test_xdp_vlan.sh
new file mode 100755
index 000000000000..51a3a31d1aac
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_xdp_vlan.sh
@@ -0,0 +1,195 @@
+#!/bin/bash
+
+TESTNAME=xdp_vlan
+
+usage() {
+  echo "Testing XDP + TC eBPF VLAN manipulations: $TESTNAME"
+  echo ""
+  echo "Usage: $0 [-vfh]"
+  echo "  -v | --verbose : Verbose"
+  echo "  --flush        : Flush before starting (e.g. after --interactive)"
+  echo "  --interactive  : Keep netns setup running after test-run"
+  echo ""
+}
+
+cleanup()
+{
+	local status=$?
+
+	if [ "$status" = "0" ]; then
+		echo "selftests: $TESTNAME [PASS]";
+	else
+		echo "selftests: $TESTNAME [FAILED]";
+	fi
+
+	if [ -n "$INTERACTIVE" ]; then
+		echo "Namespace setup still active explore with:"
+		echo " ip netns exec ns1 bash"
+		echo " ip netns exec ns2 bash"
+		exit $status
+	fi
+
+	set +e
+	ip link del veth1 2> /dev/null
+	ip netns del ns1 2> /dev/null
+	ip netns del ns2 2> /dev/null
+}
+
+# Using external program "getopt" to get --long-options
+OPTIONS=$(getopt -o hvfi: \
+    --long verbose,flush,help,interactive,debug -- "$@")
+if (( $? != 0 )); then
+    usage
+    echo "selftests: $TESTNAME [FAILED] Error calling getopt, unknown option?"
+    exit 2
+fi
+eval set -- "$OPTIONS"
+
+##  --- Parse command line arguments / parameters ---
+while true; do
+	case "$1" in
+	    -v | --verbose)
+		export VERBOSE=yes
+		shift
+		;;
+	    -i | --interactive | --debug )
+		INTERACTIVE=yes
+		shift
+		;;
+	    -f | --flush )
+		cleanup
+		shift
+		;;
+	    -- )
+		shift
+		break
+		;;
+	    -h | --help )
+		usage;
+		echo "selftests: $TESTNAME [SKIP] usage help info requested"
+		exit 0
+		;;
+	    * )
+		shift
+		break
+		;;
+	esac
+done
+
+if [ "$EUID" -ne 0 ]; then
+	echo "selftests: $TESTNAME [FAILED] need root privileges"
+	exit 1
+fi
+
+ip link set dev lo xdp off 2>/dev/null > /dev/null
+if [ $? -ne 0 ];then
+	echo "selftests: $TESTNAME [SKIP] need ip xdp support"
+	exit 0
+fi
+
+# Interactive mode likely require us to cleanup netns
+if [ -n "$INTERACTIVE" ]; then
+	ip link del veth1 2> /dev/null
+	ip netns del ns1 2> /dev/null
+	ip netns del ns2 2> /dev/null
+fi
+
+# Exit on failure
+set -e
+
+# Some shell-tools dependencies
+which ip > /dev/null
+which tc > /dev/null
+which ethtool > /dev/null
+
+# Make rest of shell verbose, showing comments as doc/info
+if [ -n "$VERBOSE" ]; then
+    set -v
+fi
+
+# Create two namespaces
+ip netns add ns1
+ip netns add ns2
+
+# Run cleanup if failing or on kill
+trap cleanup 0 2 3 6 9
+
+# Create veth pair
+ip link add veth1 type veth peer name veth2
+
+# Move veth1 and veth2 into the respective namespaces
+ip link set veth1 netns ns1
+ip link set veth2 netns ns2
+
+# NOTICE: XDP require VLAN header inside packet payload
+#  - Thus, disable VLAN offloading driver features
+#  - For veth REMEMBER TX side VLAN-offload
+#
+# Disable rx-vlan-offload (mostly needed on ns1)
+ip netns exec ns1 ethtool -K veth1 rxvlan off
+ip netns exec ns2 ethtool -K veth2 rxvlan off
+#
+# Disable tx-vlan-offload (mostly needed on ns2)
+ip netns exec ns2 ethtool -K veth2 txvlan off
+ip netns exec ns1 ethtool -K veth1 txvlan off
+
+export IPADDR1=100.64.41.1
+export IPADDR2=100.64.41.2
+
+# In ns1/veth1 add IP-addr on plain net_device
+ip netns exec ns1 ip addr add ${IPADDR1}/24 dev veth1
+ip netns exec ns1 ip link set veth1 up
+
+# In ns2/veth2 create VLAN device
+export VLAN=4011
+export DEVNS2=veth2
+ip netns exec ns2 ip link add link $DEVNS2 name $DEVNS2.$VLAN type vlan id $VLAN
+ip netns exec ns2 ip addr add ${IPADDR2}/24 dev $DEVNS2.$VLAN
+ip netns exec ns2 ip link set $DEVNS2 up
+ip netns exec ns2 ip link set $DEVNS2.$VLAN up
+
+# Bringup lo in netns (to avoids confusing people using --interactive)
+ip netns exec ns1 ip link set lo up
+ip netns exec ns2 ip link set lo up
+
+# At this point, the hosts cannot reach each-other,
+# because ns2 are using VLAN tags on the packets.
+
+ip netns exec ns2 sh -c 'ping -W 1 -c 1 100.64.41.1 || echo "Okay ping fails"'
+
+
+# Now we can use the test_xdp_vlan.c program to pop/push these VLAN tags
+# ----------------------------------------------------------------------
+# In ns1: ingress use XDP to remove VLAN tags
+export DEVNS1=veth1
+export FILE=test_xdp_vlan.o
+
+# First test: Remove VLAN by setting VLAN ID 0, using "xdp_vlan_change"
+export XDP_PROG=xdp_vlan_change
+ip netns exec ns1 ip link set $DEVNS1 xdp object $FILE section $XDP_PROG
+
+# In ns1: egress use TC to add back VLAN tag 4011
+#  (del cmd)
+#  tc qdisc del dev $DEVNS1 clsact 2> /dev/null
+#
+ip netns exec ns1 tc qdisc add dev $DEVNS1 clsact
+ip netns exec ns1 tc filter add dev $DEVNS1 egress \
+  prio 1 handle 1 bpf da obj $FILE sec tc_vlan_push
+
+# Now the namespaces can reach each-other, test with ping:
+ip netns exec ns2 ping -W 2 -c 3 $IPADDR1
+ip netns exec ns1 ping -W 2 -c 3 $IPADDR2
+
+# Second test: Replace xdp prog, that fully remove vlan header
+#
+# Catch kernel bug for generic-XDP, that does didn't allow us to
+# remove a VLAN header, because skb->protocol still contain VLAN
+# ETH_P_8021Q indication, and this cause overwriting of our changes.
+#
+export XDP_PROG=xdp_vlan_remove_outer2
+ip netns exec ns1 ip link set $DEVNS1 xdp off
+ip netns exec ns1 ip link set $DEVNS1 xdp object $FILE section $XDP_PROG
+
+# Now the namespaces should still be able reach each-other, test with ping:
+ip netns exec ns2 ping -W 2 -c 3 $IPADDR1
+ip netns exec ns1 ping -W 2 -c 3 $IPADDR2

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled
  2018-09-25 14:25 ` [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled Jesper Dangaard Brouer
@ 2018-09-26  5:36   ` Song Liu
  2018-10-01 19:13     ` Daniel Borkmann
  2018-10-03 14:04     ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 8+ messages in thread
From: Song Liu @ 2018-09-26  5:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, yoel, Alexei Starovoitov

On Tue, Sep 25, 2018 at 7:26 AM Jesper Dangaard Brouer
<brouer@redhat•com> wrote:
>
> XDP can modify (and resize) the Ethernet header in the packet.
>
> There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
> are setup before reaching (netif_receive_)generic_xdp.
>
> This bug was hit when XDP were popping VLAN headers (changing
> eth->h_proto), as skb->protocol still contains VLAN-indication
> (ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
> the packet (basically popping the VLAN again).
>
> This patch catch if XDP changed eth header in such a way, that SKB
> fields needs to be updated.
>
> Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
> ---
>  net/core/dev.c |   14 ++++++++++++++
>  1 file changed, 14 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index ca78dc5a79a3..db6d89f536cb 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4258,6 +4258,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>         struct netdev_rx_queue *rxqueue;
>         void *orig_data, *orig_data_end;
>         u32 metalen, act = XDP_DROP;
> +       __be16 orig_eth_type;
> +       struct ethhdr *eth;
> +       bool orig_bcast;
>         int hlen, off;
>         u32 mac_len;
>
> @@ -4298,6 +4301,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>         xdp->data_hard_start = skb->data - skb_headroom(skb);
>         orig_data_end = xdp->data_end;
>         orig_data = xdp->data;
> +       eth = (struct ethhdr *)xdp->data;
> +       orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
> +       orig_eth_type = eth->h_proto;
>
>         rxqueue = netif_get_rxqueue(skb);
>         xdp->rxq = &rxqueue->xdp_rxq;
> @@ -4321,6 +4327,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>
>         }
>
> +       /* check if XDP changed eth hdr such SKB needs update */
> +       eth = (struct ethhdr *)xdp->data;
> +       if ((orig_eth_type != eth->h_proto) ||
> +           (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) {

Is the actions below always correct for the condition above? Do we need
to confirm the SKB is updated properly?

> +               __skb_push(skb, mac_len);
> +               skb->protocol = eth_type_trans(skb, skb->dev);
> +       }
> +
>         switch (act) {
>         case XDP_REDIRECT:
>         case XDP_TX:
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled
  2018-09-26  5:36   ` Song Liu
@ 2018-10-01 19:13     ` Daniel Borkmann
  2018-10-03 14:04     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Borkmann @ 2018-10-01 19:13 UTC (permalink / raw)
  To: Song Liu, Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, yoel, Alexei Starovoitov

[ ping to Jesper wrt feedback ]

On 09/26/2018 07:36 AM, Song Liu wrote:
> On Tue, Sep 25, 2018 at 7:26 AM Jesper Dangaard Brouer
> <brouer@redhat•com> wrote:
>>
>> XDP can modify (and resize) the Ethernet header in the packet.
>>
>> There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
>> are setup before reaching (netif_receive_)generic_xdp.
>>
>> This bug was hit when XDP were popping VLAN headers (changing
>> eth->h_proto), as skb->protocol still contains VLAN-indication
>> (ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
>> the packet (basically popping the VLAN again).
>>
>> This patch catch if XDP changed eth header in such a way, that SKB
>> fields needs to be updated.
>>
>> Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
>> ---
>>  net/core/dev.c |   14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index ca78dc5a79a3..db6d89f536cb 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4258,6 +4258,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>>         struct netdev_rx_queue *rxqueue;
>>         void *orig_data, *orig_data_end;
>>         u32 metalen, act = XDP_DROP;
>> +       __be16 orig_eth_type;
>> +       struct ethhdr *eth;
>> +       bool orig_bcast;
>>         int hlen, off;
>>         u32 mac_len;
>>
>> @@ -4298,6 +4301,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>>         xdp->data_hard_start = skb->data - skb_headroom(skb);
>>         orig_data_end = xdp->data_end;
>>         orig_data = xdp->data;
>> +       eth = (struct ethhdr *)xdp->data;
>> +       orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
>> +       orig_eth_type = eth->h_proto;
>>
>>         rxqueue = netif_get_rxqueue(skb);
>>         xdp->rxq = &rxqueue->xdp_rxq;
>> @@ -4321,6 +4327,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>>
>>         }
>>
>> +       /* check if XDP changed eth hdr such SKB needs update */
>> +       eth = (struct ethhdr *)xdp->data;
>> +       if ((orig_eth_type != eth->h_proto) ||
>> +           (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) {
> 
> Is the actions below always correct for the condition above? Do we need
> to confirm the SKB is updated properly?
> 
>> +               __skb_push(skb, mac_len);
>> +               skb->protocol = eth_type_trans(skb, skb->dev);
>> +       }
>> +
>>         switch (act) {
>>         case XDP_REDIRECT:
>>         case XDP_TX:
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled
  2018-09-26  5:36   ` Song Liu
  2018-10-01 19:13     ` Daniel Borkmann
@ 2018-10-03 14:04     ` Jesper Dangaard Brouer
  2018-10-04 23:20       ` Song Liu
  1 sibling, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2018-10-03 14:04 UTC (permalink / raw)
  To: Song Liu; +Cc: Networking, Daniel Borkmann, yoel, Alexei Starovoitov, brouer

On Tue, 25 Sep 2018 22:36:39 -0700
Song Liu <liu.song.a23@gmail•com> wrote:

> On Tue, Sep 25, 2018 at 7:26 AM Jesper Dangaard Brouer
> <brouer@redhat•com> wrote:
> >
> > XDP can modify (and resize) the Ethernet header in the packet.
> >
> > There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
> > are setup before reaching (netif_receive_)generic_xdp.
> >
> > This bug was hit when XDP were popping VLAN headers (changing
> > eth->h_proto), as skb->protocol still contains VLAN-indication
> > (ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
> > the packet (basically popping the VLAN again).
> >
> > This patch catch if XDP changed eth header in such a way, that SKB
> > fields needs to be updated.
> >
> > Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
> > ---
> >  net/core/dev.c |   14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index ca78dc5a79a3..db6d89f536cb 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4258,6 +4258,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> >         struct netdev_rx_queue *rxqueue;
> >         void *orig_data, *orig_data_end;
> >         u32 metalen, act = XDP_DROP;
> > +       __be16 orig_eth_type;
> > +       struct ethhdr *eth;
> > +       bool orig_bcast;
> >         int hlen, off;
> >         u32 mac_len;
> >
> > @@ -4298,6 +4301,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> >         xdp->data_hard_start = skb->data - skb_headroom(skb);
> >         orig_data_end = xdp->data_end;
> >         orig_data = xdp->data;
> > +       eth = (struct ethhdr *)xdp->data;
> > +       orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
> > +       orig_eth_type = eth->h_proto;
> >
> >         rxqueue = netif_get_rxqueue(skb);
> >         xdp->rxq = &rxqueue->xdp_rxq;
> > @@ -4321,6 +4327,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> >
> >         }
> >
> > +       /* check if XDP changed eth hdr such SKB needs update */
> > +       eth = (struct ethhdr *)xdp->data;
> > +       if ((orig_eth_type != eth->h_proto) ||
> > +           (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) {  
> 
> Is the actions below always correct for the condition above? Do we need
> to confirm the SKB is updated properly?

I cannot find the issue that you are hinting to?

If the BPF prog used bpf_xdp_adjust_head(), which the included selftest
program does, then skb->data have been appropriately adjusted just
above (with __skb_pull(skb, off) or __skb_push(skb, -off)), which makes
the call to skb_reset_mac_header(skb) inside eth_type_trans() correct.

I've double checked the code, and I cannot find anything wrong...
please let me know if I missed something!?


> > +               __skb_push(skb, mac_len);
> > +               skb->protocol = eth_type_trans(skb, skb->dev);

We could change mac_len to be ETH_HLEN, because inside eth_type_trans()
the constant ETH_HLEN is used, that way we are 100% sure the
skb_push/skb_pull are "paired".  Will that be better for you?


> > +       }
> > +
> >         switch (act) {
> >         case XDP_REDIRECT:
> >         case XDP_TX:
> >  

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled
  2018-10-03 14:04     ` Jesper Dangaard Brouer
@ 2018-10-04 23:20       ` Song Liu
  0 siblings, 0 replies; 8+ messages in thread
From: Song Liu @ 2018-10-04 23:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Networking, Daniel Borkmann, yoel, Alexei Starovoitov

Hi Jesper,

On Wed, Oct 3, 2018 at 7:04 AM Jesper Dangaard Brouer <brouer@redhat•com> wrote:
>
> On Tue, 25 Sep 2018 22:36:39 -0700
> Song Liu <liu.song.a23@gmail•com> wrote:
>
> > On Tue, Sep 25, 2018 at 7:26 AM Jesper Dangaard Brouer
> > <brouer@redhat•com> wrote:
> > >
> > > XDP can modify (and resize) the Ethernet header in the packet.
> > >
> > > There is a bug in generic-XDP, because skb->protocol and skb->pkt_type
> > > are setup before reaching (netif_receive_)generic_xdp.
> > >
> > > This bug was hit when XDP were popping VLAN headers (changing
> > > eth->h_proto), as skb->protocol still contains VLAN-indication
> > > (ETH_P_8021Q) causing invocation of skb_vlan_untag(skb), which corrupt
> > > the packet (basically popping the VLAN again).
> > >
> > > This patch catch if XDP changed eth header in such a way, that SKB
> > > fields needs to be updated.
> > >
> > > Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat•com>
> > > ---
> > >  net/core/dev.c |   14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index ca78dc5a79a3..db6d89f536cb 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -4258,6 +4258,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> > >         struct netdev_rx_queue *rxqueue;
> > >         void *orig_data, *orig_data_end;
> > >         u32 metalen, act = XDP_DROP;
> > > +       __be16 orig_eth_type;
> > > +       struct ethhdr *eth;
> > > +       bool orig_bcast;
> > >         int hlen, off;
> > >         u32 mac_len;
> > >
> > > @@ -4298,6 +4301,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> > >         xdp->data_hard_start = skb->data - skb_headroom(skb);
> > >         orig_data_end = xdp->data_end;
> > >         orig_data = xdp->data;
> > > +       eth = (struct ethhdr *)xdp->data;
> > > +       orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
> > > +       orig_eth_type = eth->h_proto;
> > >
> > >         rxqueue = netif_get_rxqueue(skb);
> > >         xdp->rxq = &rxqueue->xdp_rxq;
> > > @@ -4321,6 +4327,14 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> > >
> > >         }
> > >
> > > +       /* check if XDP changed eth hdr such SKB needs update */
> > > +       eth = (struct ethhdr *)xdp->data;
> > > +       if ((orig_eth_type != eth->h_proto) ||
> > > +           (orig_bcast != is_multicast_ether_addr_64bits(eth->h_dest))) {
> >
> > Is the actions below always correct for the condition above? Do we need
> > to confirm the SKB is updated properly?
>
> I cannot find the issue that you are hinting to?
>
> If the BPF prog used bpf_xdp_adjust_head(), which the included selftest
> program does, then skb->data have been appropriately adjusted just
> above (with __skb_pull(skb, off) or __skb_push(skb, -off)), which makes
> the call to skb_reset_mac_header(skb) inside eth_type_trans() correct.
>
> I've double checked the code, and I cannot find anything wrong...
> please let me know if I missed something!?
>
>
> > > +               __skb_push(skb, mac_len);
> > > +               skb->protocol = eth_type_trans(skb, skb->dev);
>
> We could change mac_len to be ETH_HLEN, because inside eth_type_trans()
> the constant ETH_HLEN is used, that way we are 100% sure the
> skb_push/skb_pull are "paired".  Will that be better for you?

Thanks for the explanation. Now I understand it better.
I think using ETH_HLEN is better. Both napi_frags_finish() and
__dev_forward_skb()
use similar pattern.

Song

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-05  6:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-25 14:25 [bpf-next PATCH 0/3] bpf/xdp: fix generic-XDP and demonstrate VLAN manipulation Jesper Dangaard Brouer
2018-09-25 14:25 ` [bpf-next PATCH 1/3] net: fix generic XDP to handle if eth header was mangled Jesper Dangaard Brouer
2018-09-26  5:36   ` Song Liu
2018-10-01 19:13     ` Daniel Borkmann
2018-10-03 14:04     ` Jesper Dangaard Brouer
2018-10-04 23:20       ` Song Liu
2018-09-25 14:25 ` [bpf-next PATCH 2/3] bpf: make TC vlan bpf_helpers avail to selftests Jesper Dangaard Brouer
2018-09-25 14:25 ` [bpf-next PATCH 3/3] selftests/bpf: add XDP selftests for modifying and popping VLAN headers Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox