* [RFC PATCH 0/2] netdev: show the number of tx-packets in device
@ 2010-05-24 4:49 Koki Sanagi
2010-05-24 4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
2010-05-24 4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
0 siblings, 2 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24 4:49 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku
This patch-set adds tracepoints to dev_hard_start_xmit, consume_skb and
dev_kfree_skb_irq and perf script which calculates the time from entry of
ndo_start_xmit to dev_kfree_skb_* and the number of tx-packets in device.
-Perf script description
This script calculates two metric.
metric1: lap time between start_xmit - free_skb
This script calculate the time a packet passes from entry of ndo_start_xmit to
dev_kfree_skb_irq or consume_skb. It indicate a time driver/device owns that
packet. This script outputs an average time of all packets and a longest of
that.
metric2: number of packets in device
>From the above time, we can calculate the number of packets in device at a
certain time. This script outputs an average of the number of packets in device
and a largest of that.
-Merit
These tracepoints and script have two merits.
1. Detecting a packed tx-ring of network device
2. Detecting a defect of transmit functionality of network device
merit1: Detecting a packed tx-ring of network device
Using attached scripts, we can get a maximum number of packets in a device. If
it reaches to the number of packets a device can own, tx-ring of that device is
full and causes loss of network transmit performance. Because the driver of the
device drop packet or stops tx-ring and reject it until it keeps some space.
So, to keep good network transmit performance, it is good to keep some space in
tx-ring. To keep some space in tx-ring, these tracepoints and script are
useful.
To check this merit, I did a test.
Before starting a test, I want you to know that a maximum number of tx-packets
e1000e can own is (tx-ring size - 20) / 2 packets.
Because e1000e keeps 20 descriptors for frags and 1 packet needs 2 descriptors
due to tx-checksum.
So, if tx-ring size is 256,
(256 - 20) / 2 = 118 packets
If tx-ring size is 512,
(512 - 20) / 2 = 246 packets
Environment:
Test NIC: Intel 82571EB (InterruptThrottleRate=1000)
Opponent NIC: Broadcom BCM5703X
InterruptThrottleRate was set to 1000 to make tx-ring packed deliberately.
Test load tool:
netperf -H XXX.XXX.XXX.XXX -t UDP_STREAM -- -m 1
With this environment, I compared following 2 cases.
1.Tx-ring size=256 case
2.Tx-ring size=512 case
Result:
1.The case of Tx-ring=256:
eth0 TX packets=1137811
lap time between start_xmit - free_skb:
avg=0.795687msec
max=0.985911msec
number of packets in device:
avg= 64.66
max= 118
netperf's result:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
112640 1 10.00 1179077 0 0.94
109568 10.00 544750 0.44
2.The case of Tx-ring=512:
eth0 TX packets=1531629
lap time between start_xmit - free_skb:
avg=0.370052msec
max=0.982069msec
number of packets in device:
avg= 49.70
max= 164
netperf's result:
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
112640 1 10.00 1577840 0 1.26
109568 10.00 871058 0.70
In the case of tx-ring size=256(default), maximum number of packets in device
reaches to 118. So this tx-ring is full and becomes a cause of network
performance loss.
On the other hand, in the case of tx-ring size=512, e1000e can own 246 packets,
but maximum number of packets in device doesn't reach it. so tx-ring has always
some space and there is no network performance loss caused by packed tx-ring.
Actually, about transmit throughput, The case of Tx-ring size=512 is better
than the case of tx-ring size=256.
Like this, the number of packets in device is available to tune tx-ring size or
other parameters to avoid packed tx-ring and is related to network transmit
performance.
merit2: Detecting a defect of transmit functionality of network device
When device can't transmit due to hardware fault or driver's bug(I've
encountered this), we can detect it. Because in such case, dev_hard_start_xmit
is passed, but dev_kfree_skb_* is not passed.
NOTE:
This script has some problem,
-The number of tx-packets of netperf and that of this script are not equal.
-Sometimes The max number of packets in device larger than the device can own.
Thanks,
Koki Sanagi.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq
2010-05-24 4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
@ 2010-05-24 4:51 ` Koki Sanagi
2010-05-24 4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
1 sibling, 0 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24 4:51 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku
This patch adds tracepoint to dev_hard_start_xmit, consume_skb and
dev_kfree_skb_irq to monitor tx-packets in device.
Signed-off-by: Koki Sanagi <sanagi.koki@jp•fujitsu.com>
---
include/trace/events/skb.h | 64 ++++++++++++++++++++++++++++++++++++++++++++
net/core/dev.c | 4 +++
net/core/skbuff.c | 1 +
3 files changed, 69 insertions(+), 0 deletions(-)
diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
index 4b2be6d..ed97580 100644
--- a/include/trace/events/skb.h
+++ b/include/trace/events/skb.h
@@ -9,6 +9,52 @@
#include <linux/tracepoint.h>
/*
+ * netdev_start_xmit - invoked when skb is passed to the driver
+ * @skb: pointer to struct sk_buff
+ * @dev: pointer to struct net_device
+ */
+TRACE_EVENT(netdev_start_xmit,
+
+ TP_PROTO(struct sk_buff *skb,
+ struct net_device *dev),
+
+ TP_ARGS(skb, dev),
+
+ TP_STRUCT__entry(
+ __field( const void *, skbaddr )
+ __field( unsigned int, len )
+ __string( name, dev->name )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ __entry->len = skb->len;
+ __assign_str(name, dev->name);
+ ),
+
+ TP_printk("dev=%s skbaddr=%p len=%u",
+ __get_str(name), __entry->skbaddr, __entry->len)
+);
+
+TRACE_EVENT(consume_skb,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ ),
+
+ TP_printk("skbaddr=%p",
+ __entry->skbaddr)
+);
+
+/*
* Tracepoint for free an sk_buff:
*/
TRACE_EVENT(kfree_skb,
@@ -35,6 +81,24 @@ TRACE_EVENT(kfree_skb,
__entry->skbaddr, __entry->protocol, __entry->location)
);
+TRACE_EVENT(dev_kfree_skb_irq,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ ),
+
+ TP_printk("skbaddr=%p",
+ __entry->skbaddr)
+);
+
TRACE_EVENT(skb_copy_datagram_iovec,
TP_PROTO(const struct sk_buff *skb, int len),
diff --git a/net/core/dev.c b/net/core/dev.c
index 32611c8..647d812 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -130,6 +130,7 @@
#include <linux/jhash.h>
#include <linux/random.h>
#include <trace/events/napi.h>
+#include <trace/events/skb.h>
#include <linux/pci.h>
#include "net-sysfs.h"
@@ -1577,6 +1578,7 @@ void dev_kfree_skb_irq(struct sk_buff *skb)
struct softnet_data *sd;
unsigned long flags;
+ trace_dev_kfree_skb_irq(skb);
local_irq_save(flags);
sd = &__get_cpu_var(softnet_data);
skb->next = sd->completion_queue;
@@ -1919,6 +1921,7 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
goto gso;
}
+ trace_netdev_start_xmit(skb, dev);
rc = ops->ndo_start_xmit(skb, dev);
if (rc == NETDEV_TX_OK)
txq_trans_update(txq);
@@ -1939,6 +1942,7 @@ gso:
if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
skb_dst_drop(nskb);
+ trace_netdev_start_xmit(nskb, dev);
rc = ops->ndo_start_xmit(nskb, dev);
if (unlikely(rc != NETDEV_TX_OK)) {
if (rc & ~NETDEV_TX_MASK)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a9b0e1f..b9c963c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -466,6 +466,7 @@ void consume_skb(struct sk_buff *skb)
smp_rmb();
else if (likely(!atomic_dec_and_test(&skb->users)))
return;
+ trace_consume_skb(skb);
__kfree_skb(skb);
}
EXPORT_SYMBOL(consume_skb);
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device
2010-05-24 4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
2010-05-24 4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
@ 2010-05-24 4:52 ` Koki Sanagi
1 sibling, 0 replies; 3+ messages in thread
From: Koki Sanagi @ 2010-05-24 4:52 UTC (permalink / raw)
To: netdev; +Cc: davem, nhorman, kaneshige.kenji, izumi.taku
This patch adds perf script to calculate the time from entry of
ndo_start_xmit to dev_kfree_skb_* and the number of tx-packets in device.
Signed-off-by: Koki Sanagi <sanagi.koki@jp•fujitsu.com>
---
.../scripts/python/bin/tx-packet-in-device-record | 2 +
.../scripts/python/bin/tx-packet-in-device-report | 4 +
tools/perf/scripts/python/tx-packet-in-device.py | 109 ++++++++++++++++++++
3 files changed, 115 insertions(+), 0 deletions(-)
diff --git a/tools/perf/scripts/python/bin/tx-packet-in-device-record b/tools/perf/scripts/python/bin/tx-packet-in-device-record
new file mode 100644
index 0000000..18f2356
--- /dev/null
+++ b/tools/perf/scripts/python/bin/tx-packet-in-device-record
@@ -0,0 +1,2 @@
+#!/bin/bash
+perf record -c 1 -f -a -M -R -e skb:netdev_start_xmit -e skb:consume_skb -e skb:dev_kfree_skb_irq
diff --git a/tools/perf/scripts/python/bin/tx-packet-in-device-report b/tools/perf/scripts/python/bin/tx-packet-in-device-report
new file mode 100644
index 0000000..8ef4cc2
--- /dev/null
+++ b/tools/perf/scripts/python/bin/tx-packet-in-device-report
@@ -0,0 +1,4 @@
+#!/bin/bash
+# description: netif_receive_skb counts per poll
+# args: [comm]
+perf trace -s ~/libexec/perf-core/scripts/python/tx-packet-in-device.py $1
diff --git a/tools/perf/scripts/python/tx-packet-in-device.py b/tools/perf/scripts/python/tx-packet-in-device.py
new file mode 100644
index 0000000..fb1933f
--- /dev/null
+++ b/tools/perf/scripts/python/tx-packet-in-device.py
@@ -0,0 +1,109 @@
+# perf trace event handlers, generated by perf trace -g python
+# Licensed under the terms of the GNU GPL License version 2
+
+# The common_* event handler fields are the most useful fields common to
+# all events. They don't necessarily correspond to the 'common_*' fields
+# in the format files. Those fields not available as handler params can
+# be retrieved using Python functions of the form common_*(context).
+# See the perf-trace-python Documentation for the list of available functions.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+# skb_dic = {skbaddr: {name:*, start_time:*}}
+#
+# skbaddr: address of skb through dev_hard_start_xmit
+# name: name of device
+# start_time: the time dev_start_hard_xmit pass
+skb_dic = {};
+
+# dev_stat_dic = {name: {pkt_in_tx:*, max_pkt_in_tx:*, total_pkt:*,
+# prev_time:*, total_time:*, max_lap_time:*,
+# total_lap_time:*}}
+#
+# name: name of device
+# pkt_in_tx: tx-packets a device has currently
+# max_pkt_in_tx: maximum of the above
+# total_pkt: total tx-packets through a device
+# prev_time: the time starting xmit or freeing skb
+# happened previously
+# total_time: the time from first starting xmit to now
+# max_lap_time: maximum time from starting xmit to freein skb
+# total_lap_time: sum of time tx-packet is in device
+dev_stat_dic = {};
+
+def trace_end():
+ for name in sorted(dev_stat_dic.keys()):
+ cstat = dev_stat_dic[name]
+ print "%s\tTX packets=%d" %\
+ (name, cstat['total_pkt'])
+ print "\tlap time between start_xmit - free_skb:"
+ avg_nsec = avg(1.0 * cstat['total_lap_time'],
+ cstat['total_pkt'])
+ print "\t avg=%fmsec" % (avg_nsec / 1000000.0)
+ print "\t max=%fmsec" % (cstat['max_lap_time'] / 1000000.0)
+ print "\tnumber of packets in device:"
+ print "\t avg=%7.2f" % avg(cstat['total_lap_time'] * 1.0,
+ cstat['total_time'])
+ print "\t max=%4d" % cstat['max_pkt_in_tx']
+ print ""
+
+def skb__dev_kfree_skb_irq(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr):
+ free_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr)
+
+def skb__consume_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr):
+ free_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr)
+
+def free_skb(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr):
+ if skbaddr in skb_dic.keys():
+ ctime = nsecs(common_secs, common_nsecs)
+ lap_time = ctime - skb_dic[skbaddr]['start_time']
+ cstat = dev_stat_dic[skb_dic[skbaddr]['name']]
+ cstat['total_lap_time'] += lap_time;
+ cstat['total_pkt'] += 1;
+ cstat = dev_stat_dic[skb_dic[skbaddr]['name']]
+ cstat['total_time'] += ctime - cstat['prev_time']
+ cstat['prev_time'] = ctime
+ cstat['pkt_in_tx'] -= 1;
+ if lap_time > cstat['max_lap_time']:
+ cstat['max_lap_time'] = lap_time
+ del skb_dic[skbaddr]
+
+def skb__netdev_start_xmit(event_name, context, common_cpu,
+ common_secs, common_nsecs, common_pid, common_comm,
+ skbaddr, len, name):
+ retry = 0
+ ctime = nsecs(common_secs, common_nsecs)
+ if skbaddr in skb_dic.keys():
+ retry = 1;
+ skb_dic[skbaddr] = {'name':name, 'start_time':ctime}
+ if name not in dev_stat_dic.keys():
+ dev_stat_dic[name] = {'pkt_in_tx':0, 'max_pkt_in_tx':0,\
+ 'total_pkt':0,\
+ 'prev_time':ctime, 'total_time':0,\
+ 'max_lap_time':0, 'total_lap_time':0}
+ cstat = dev_stat_dic[name]
+ cstat['total_time'] += ctime - cstat['prev_time']
+ cstat['prev_time'] = ctime
+ if retry == 0:
+ cstat['pkt_in_tx'] += 1
+ if cstat['pkt_in_tx'] > cstat['max_pkt_in_tx']:
+ cstat['max_pkt_in_tx'] = cstat['pkt_in_tx']
+
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-05-24 4:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-24 4:49 [RFC PATCH 0/2] netdev: show the number of tx-packets in device Koki Sanagi
2010-05-24 4:51 ` [RFC PATCH 1/2] netdev: add tracepoint to dev_hard_start_xmit, consume_skb and dev_kfree_skb_irq Koki Sanagi
2010-05-24 4:52 ` [RFC PATCH 2/2] netdev: perf script to show the number of tx-packets in device Koki Sanagi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox