public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] netdev: show the number of tx-packets in device
@ 2010-05-24  4:49 Koki Sanagi
  2010-05-24  4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
  2010-05-24  4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
  0 siblings, 2 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24  4:49 UTC (permalink / raw)
  To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku

This patch-set adds tracepoints to dev_hard_start_xmit, consume_skb and
dev_kfree_skb_irq and perf script which calculates the time from entry of
ndo_start_xmit to dev_kfree_skb_* and the number of tx-packets in device.

-Perf script description
This script calculates two metric.

metric1: lap time between start_xmit - free_skb
This script calculate the time a packet passes from entry of ndo_start_xmit to
dev_kfree_skb_irq or consume_skb. It indicate a time driver/device owns that
packet. This script outputs an average time of all packets and a longest of
that.

metric2: number of packets in device
>From the above time, we can calculate the number of packets in device at a
certain time. This script outputs an average of the number of packets in device
and a largest of that.

-Merit
These tracepoints and script have two merits.

1. Detecting a packed tx-ring of network device
2. Detecting a defect of transmit functionality of network device

merit1: Detecting a packed tx-ring of network device

Using attached scripts, we can get a maximum number of packets in a device. If
it reaches to the number of packets a device can own, tx-ring of that device is
full and causes loss of network transmit performance. Because the driver of the
device drop packet or stops tx-ring and reject it until it keeps some space.
So, to keep good network transmit performance, it is good to keep some space in
tx-ring. To keep some space in tx-ring, these tracepoints and script are
useful.

To check this merit, I did a test.

Before starting a test, I want you to know that a maximum number of tx-packets
e1000e can own is (tx-ring size - 20) / 2 packets.
Because e1000e keeps 20 descriptors for frags and 1 packet needs 2 descriptors
due to tx-checksum.
So, if tx-ring size is 256,
(256 - 20) / 2 = 118 packets
If tx-ring size is 512,
(512 - 20) / 2 = 246 packets

Environment:
Test NIC:     Intel 82571EB (InterruptThrottleRate=1000)
Opponent NIC: Broadcom BCM5703X
InterruptThrottleRate was set to 1000 to make tx-ring packed deliberately.

Test load tool:
netperf -H XXX.XXX.XXX.XXX -t UDP_STREAM -- -m 1

With this environment, I compared following 2 cases.
1.Tx-ring size=256 case
2.Tx-ring size=512 case

Result:
1.The case of Tx-ring=256:
eth0    TX packets=1137811
        lap time between start_xmit - free_skb:
            avg=0.795687msec
            max=0.985911msec
         number of packets in device:
            avg=  64.66
            max= 118

netperf's result:
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec
112640       1   10.00     1179077      0       0.94
109568           10.00      544750              0.44

2.The case of Tx-ring=512:
eth0    TX packets=1531629
        lap time between start_xmit - free_skb:
            avg=0.370052msec
            max=0.982069msec
        number of packets in device:
            avg=  49.70
            max= 164

netperf's result:
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec
112640       1   10.00     1577840      0       1.26
109568           10.00      871058              0.70

In the case of tx-ring size=256(default),  maximum number of packets in device
reaches to 118. So this tx-ring is full and becomes a cause of network
performance loss.
On the other hand, in the case of tx-ring size=512, e1000e can own 246 packets,
but maximum number of packets in device doesn't reach it. so tx-ring has always
some space and there is  no network performance loss caused by packed tx-ring.
Actually, about transmit throughput, The case of Tx-ring size=512 is better
than the case of tx-ring size=256.
Like this, the number of packets in device is available to tune tx-ring size or
other parameters to avoid packed tx-ring and is related to network transmit
performance.

merit2: Detecting a defect of transmit functionality of network device

When device can't transmit due to hardware fault or driver's bug(I've
encountered this), we can detect it. Because in such case, dev_hard_start_xmit
is passed, but dev_kfree_skb_* is not passed.

NOTE:
This script has some problem,

-The number of tx-packets of netperf and  that of this script are not equal.
-Sometimes The max number of packets in device larger than the device can own.

Thanks,
Koki Sanagi.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq
  2010-05-24  4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
@ 2010-05-24  4:51 ` Koki Sanagi
  2010-05-24  4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
  1 sibling, 0 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24  4:51 UTC (permalink / raw)
  To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku

This patch adds tracepoint to dev_hard_start_xmit, consume_skb and
dev_kfree_skb_irq to monitor tx-packets in device.

Signed-off-by: Koki Sanagi <sanagi.koki@jp•fujitsu.com>
---
 include/trace/events/skb.h |   64 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |    4 +++
 net/core/skbuff.c          |    1 +
 3 files changed, 69 insertions(+), 0 deletions(-)

diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..ed97580 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -9,6 +9,52 @@
 #include <linux/tracepoint.h>
 
 /*
+ *  netdev_start_xmit - invoked when skb is passed to the driver
+ *  @skb:		pointer to struct sk_buff
+ *  @dev:		pointer to struct net_device
+ */
+TRACE_EVENT(netdev_start_xmit,
+
+	TP_PROTO(struct sk_buff *skb,
+		 struct net_device *dev),
+
+	TP_ARGS(skb, dev),
+
+	TP_STRUCT__entry(
+		__field(	const void *,	skbaddr		)
+		__field(	unsigned int,	len		)
+		__string(	name,		dev->name	)
+       ),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->len = skb->len;
+		__assign_str(name, dev->name);
+	),
+
+	TP_printk("dev=%s skbaddr=%p len=%u",
+		__get_str(name), __entry->skbaddr, __entry->len)
+);
+
+TRACE_EVENT(consume_skb,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p",
+		__entry->skbaddr)
+);
+
+/*
  * Tracepoint for free an sk_buff:
  */
 TRACE_EVENT(kfree_skb,
@@ -35,6 +81,24 @@ TRACE_EVENT(kfree_skb,
 		__entry->skbaddr, __entry->protocol, __entry->location)
 );
 
+TRACE_EVENT(dev_kfree_skb_irq,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(	void *,		skbaddr		)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p",
+		__entry->skbaddr)
+);
+
 TRACE_EVENT(skb_copy_datagram_iovec,
 
 	TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/dev.c b/net/core/dev.c
index 32611c8..647d812 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
 #include <linux/jhash.h>
 #include <linux/random.h>
 #include <trace/events/napi.h>
+#include <trace/events/skb.h>
 #include <linux/pci.h>
 
 #include "net-sysfs.h"
@@ -1577,6 +1578,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
 		struct softnet_data *sd;
 		unsigned long flags;
 
+		trace_dev_kfree_skb_irq(skb);
 		local_irq_save(flags);
 		sd = &__get_cpu_var(softnet_data);
 		skb->next = sd->completion_queue;
@@ -1919,6 +1921,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				goto gso;
 		}
 
+		trace_netdev_start_xmit(skb, dev);
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
@@ -1939,6 +1942,7 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		trace_netdev_start_xmit(nskb, dev);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a9b0e1f..b9c963c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
+	trace_consume_skb(skb);
 	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(consume_skb);


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device
  2010-05-24  4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
  2010-05-24  4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
@ 2010-05-24  4:52 ` Koki Sanagi
  1 sibling, 0 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24  4:52 UTC (permalink / raw)
  To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku

This patch adds perf script to calculate the time from entry of
ndo_start_xmit to dev_kfree_skb_* and the number of tx-packets in device.

Signed-off-by: Koki Sanagi <sanagi.koki@jp•fujitsu.com>
---
 .../scripts/python/bin/tx-packet-in-device-record  |    2 +
 .../scripts/python/bin/tx-packet-in-device-report  |    4 +
 tools/perf/scripts/python/tx-packet-in-device.py   |  109 ++++++++++++++++++++
 3 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/tools/perf/scripts/python/bin/tx-packet-in-device-record b/tools/perf/scripts/python/bin/tx-packet-in-device-record
new file mode 100644
index 0000000..18f2356
--- /dev/null
+++ b/tools/perf/scripts/python/bin/tx-packet-in-device-record
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf record -c 1 -f -a -M -R -e skb:netdev_start_xmit -e skb:consume_skb -e skb:dev_kfree_skb_irq
diff --git a/tools/perf/scripts/python/bin/tx-packet-in-device-report b/tools/perf/scripts/python/bin/tx-packet-in-device-report
new file mode 100644
index 0000000..8ef4cc2
--- /dev/null
+++ b/tools/perf/scripts/python/bin/tx-packet-in-device-report
@@ -0,0 +1,4 @@
+#!/bin/bash
+# description: netif_receive_skb counts per poll
+# args: [comm]
+perf trace -s ~/libexec/perf-core/scripts/python/tx-packet-in-device.py $1
diff --git a/tools/perf/scripts/python/tx-packet-in-device.py b/tools/perf/scripts/python/tx-packet-in-device.py
new file mode 100644
index 0000000..fb1933f
--- /dev/null
+++ b/tools/perf/scripts/python/tx-packet-in-device.py
@@ -0,0 +1,109 @@
+# perf trace event handlers, generated by perf trace -g python
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events.  They don't necessarily correspond to the 'common_*' fields
+# in the format files.  Those fields not available as handler params can
+# be retrieved using Python functions of the form common_*(context).
+# See the perf-trace-python Documentation for the list of available functions.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+	'/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+# skb_dic = {skbaddr: {name:*, start_time:*}}
+#
+# skbaddr: address of skb through dev_hard_start_xmit
+# name: name of device
+# start_time: the time dev_start_hard_xmit pass
+skb_dic = {};
+
+# dev_stat_dic = {name: {pkt_in_tx:*, max_pkt_in_tx:*, total_pkt:*,
+#			 prev_time:*, total_time:*, max_lap_time:*,
+#			 total_lap_time:*}}
+#
+# name: name of device
+# pkt_in_tx: tx-packets a device has currently
+# max_pkt_in_tx: maximum of the above
+# total_pkt: total tx-packets through a device
+# prev_time: the time starting xmit or freeing skb
+#            happened previously
+# total_time: the time from first starting xmit to now
+# max_lap_time: maximum time from starting xmit to freein skb
+# total_lap_time: sum of time tx-packet is in device
+dev_stat_dic = {};
+
+def trace_end():
+	for name in sorted(dev_stat_dic.keys()):
+		cstat = dev_stat_dic[name]
+		print "%s\tTX packets=%d" %\
+			(name, cstat['total_pkt'])
+		print "\tlap time between start_xmit - free_skb:"
+		avg_nsec = avg(1.0 * cstat['total_lap_time'],
+				cstat['total_pkt'])
+		print "\t    avg=%fmsec" % (avg_nsec / 1000000.0)
+		print "\t    max=%fmsec" % (cstat['max_lap_time'] / 1000000.0)
+		print "\tnumber of packets in device:"
+		print "\t    avg=%7.2f" % avg(cstat['total_lap_time'] * 1.0,
+						cstat['total_time'])
+		print "\t    max=%4d" % cstat['max_pkt_in_tx']
+		print ""
+
+def skb__dev_kfree_skb_irq(event_name, context, common_cpu,
+	common_secs, common_nsecs, common_pid, common_comm,
+	skbaddr):
+		free_skb(event_name, context, common_cpu,
+		common_secs, common_nsecs, common_pid, common_comm,
+		skbaddr)
+
+def skb__consume_skb(event_name, context, common_cpu,
+	common_secs, common_nsecs, common_pid, common_comm,
+	skbaddr):
+		free_skb(event_name, context, common_cpu,
+		common_secs, common_nsecs, common_pid, common_comm,
+		skbaddr)
+
+def free_skb(event_name, context, common_cpu,
+	common_secs, common_nsecs, common_pid, common_comm,
+	skbaddr):
+		if skbaddr in skb_dic.keys():
+			ctime = nsecs(common_secs, common_nsecs)
+			lap_time = ctime - skb_dic[skbaddr]['start_time']
+			cstat = dev_stat_dic[skb_dic[skbaddr]['name']]
+			cstat['total_lap_time'] += lap_time;
+			cstat['total_pkt'] += 1;
+			cstat = dev_stat_dic[skb_dic[skbaddr]['name']]
+			cstat['total_time'] += ctime - cstat['prev_time']
+			cstat['prev_time'] = ctime
+			cstat['pkt_in_tx'] -= 1;
+			if lap_time > cstat['max_lap_time']:
+				cstat['max_lap_time'] = lap_time
+			del skb_dic[skbaddr]
+
+def skb__netdev_start_xmit(event_name, context, common_cpu,
+	common_secs, common_nsecs, common_pid, common_comm,
+	skbaddr, len, name):
+		retry = 0
+		ctime = nsecs(common_secs, common_nsecs)
+		if skbaddr in skb_dic.keys():
+			retry = 1;
+		skb_dic[skbaddr] = {'name':name, 'start_time':ctime}
+		if name not in dev_stat_dic.keys():
+			dev_stat_dic[name] = {'pkt_in_tx':0, 'max_pkt_in_tx':0,\
+					'total_pkt':0,\
+					'prev_time':ctime, 'total_time':0,\
+					'max_lap_time':0, 'total_lap_time':0}
+		cstat = dev_stat_dic[name]
+		cstat['total_time'] += ctime - cstat['prev_time']
+		cstat['prev_time'] = ctime
+		if retry == 0:
+			cstat['pkt_in_tx'] += 1
+		if cstat['pkt_in_tx'] > cstat['max_pkt_in_tx']:
+			cstat['max_pkt_in_tx'] = cstat['pkt_in_tx']
+


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2010-05-24  4:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24  4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
2010-05-24  4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
2010-05-24  4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox