[PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time

public inbox for netdev@vger.kernel.org 
 help / color / mirror / Atom feed

* [PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time
@ 2017-09-01 16:53 Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map Yonghong Song
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-01 16:53 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch set implements two helper functions.
The helper bpf_perf_read_counter_time reads counter/time_enabled/time_running
for perf event array map. The helper bpf_perf_prog_read_time read
time_enabled/time_running for bpf prog with type BPF_PROG_TYPE_PERF_EVENT.

Yonghong Song (4):
  bpf: add helper bpf_perf_read_counter_time for perf event array map
  bpf: add a test case for helper bpf_perf_read_counter_time
  bpf: add helper bpf_perf_prog_read_time
  bpf: add a test case for helper bpf_perf_prog_read_time

 include/linux/perf_event.h                |  3 ++
 include/uapi/linux/bpf.h                  | 29 +++++++++++-
 kernel/bpf/verifier.c                     |  4 +-
 kernel/events/core.c                      |  3 +-
 kernel/trace/bpf_trace.c                  | 73 +++++++++++++++++++++++++++++--
 samples/bpf/trace_event_kern.c            |  5 +++
 samples/bpf/tracex6_kern.c                | 26 +++++++++++
 samples/bpf/tracex6_user.c                | 13 +++++-
 tools/testing/selftests/bpf/bpf_helpers.h |  7 +++
 9 files changed, 155 insertions(+), 8 deletions(-)

-- 
2.9.5

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map
  2017-09-01 16:53 [PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
@ 2017-09-01 16:53 ` Yonghong Song
  2017-09-01 20:29   ` Alexei Starovoitov
  2017-09-01 20:41   ` Peter Zijlstra
  2017-09-01 16:53 ` [PATCH net-next 2/4] bpf: add a test case for helper bpf_perf_read_counter_time Yonghong Song
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-01 16:53 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch adds helper bpf_perf_read_counter_time for kprobed based perf
event array map, to read perf counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.
To achieve scaling factor between two bpf invocations, users
can can use cpu_id as the key (which is typical for perf array usage model)
to remember the previous value and do the calculation inside the
bpf program.

Signed-off-by: Yonghong Song <yhs@fb•com>
---
 include/linux/perf_event.h |  2 ++
 include/uapi/linux/bpf.h   | 21 +++++++++++++++++++-
 kernel/bpf/verifier.c      |  4 +++-
 kernel/events/core.c       |  2 +-
 kernel/trace/bpf_trace.c   | 49 ++++++++++++++++++++++++++++++++++++++++++----
 5 files changed, 71 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index b14095b..7fd5e94 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -901,6 +901,8 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
 int perf_event_read_local(struct perf_event *event, u64 *value);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
+extern void calc_timer_values(struct perf_event *event, u64 *now,
+         u64 *enabled, u64 *running);
 
 
 struct perf_sample_data {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba848b7..9c23bef 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -582,6 +582,14 @@ union bpf_attr {
  *	@map: pointer to sockmap to update
  *	@key: key to insert/update sock in map
  *	@flags: same flags as map update elem
+ *
+ * int bpf_perf_read_counter_time(map, flags, counter_time_buf, buf_size)
+ *     read perf event counter value and perf event enabled/running time
+ *     @map: pointer to perf_event_array map
+ *     @flags: index of event in the map or bitmask flags
+ *     @counter_time_buf: buf to fill
+ *     @buf_size: size of the counter_time_buf
+ *     Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -638,6 +646,7 @@ union bpf_attr {
 	FN(redirect_map),		\
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
+	FN(perf_read_counter_time),		\
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -681,7 +690,8 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX		(1ULL << 1)
 #define BPF_F_DONT_FRAGMENT		(1ULL << 2)
 
-/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
+/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
+ * BPF_FUNC_perf_read_counter_time flags. */
 #define BPF_F_INDEX_MASK		0xffffffffULL
 #define BPF_F_CURRENT_CPU		BPF_F_INDEX_MASK
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
@@ -864,4 +874,13 @@ enum {
 #define TCP_BPF_IW		1001	/* Set TCP initial congestion window */
 #define TCP_BPF_SNDCWND_CLAMP	1002	/* Set sndcwnd_clamp */
 
+struct bpf_perf_time {
+	__u64 enabled;
+	__u64 running;
+};
+struct bpf_perf_counter_time {
+	__u64 counter;
+	struct bpf_perf_time time;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d690c7d..c4d29e3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1494,7 +1494,8 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		break;
 	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
 		if (func_id != BPF_FUNC_perf_event_read &&
-		    func_id != BPF_FUNC_perf_event_output)
+		    func_id != BPF_FUNC_perf_event_output &&
+		    func_id != BPF_FUNC_perf_read_counter_time)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_STACK_TRACE:
@@ -1537,6 +1538,7 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id)
 		break;
 	case BPF_FUNC_perf_event_read:
 	case BPF_FUNC_perf_event_output:
+	case BPF_FUNC_perf_read_counter_time:
 		if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
 			goto error;
 		break;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8c01572..ef5c7fb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4883,7 +4883,7 @@ static int perf_event_index(struct perf_event *event)
 	return event->pmu->event_idx(event);
 }
 
-static void calc_timer_values(struct perf_event *event,
+void calc_timer_values(struct perf_event *event,
 				u64 *now,
 				u64 *enabled,
 				u64 *running)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index dc498b6..b807b1a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -255,13 +255,13 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
 	return &bpf_trace_printk_proto;
 }
 
-BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
-{
+static __always_inline int
+get_map_perf_counter(struct bpf_map *map, u64 flags,
+		u64 *value, struct perf_event **pe) {
 	struct bpf_array *array = container_of(map, struct bpf_array, map);
 	unsigned int cpu = smp_processor_id();
 	u64 index = flags & BPF_F_INDEX_MASK;
 	struct bpf_event_entry *ee;
-	u64 value = 0;
 	int err;
 
 	if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
@@ -275,7 +275,19 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 	if (!ee)
 		return -ENOENT;
 
-	err = perf_event_read_local(ee->event, &value);
+	err = perf_event_read_local(ee->event, value);
+	if (!err && pe)
+		*pe = ee->event;
+	return err;
+}
+
+
+BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
+{
+	u64 value = 0;
+	int err;
+
+	err = get_map_perf_counter(map, flags, &value, NULL);
 	/*
 	 * this api is ugly since we miss [-22..-2] range of valid
 	 * counter values, but that's uapi
@@ -285,6 +297,23 @@ BPF_CALL_2(bpf_perf_event_read, struct bpf_map *, map, u64, flags)
 	return value;
 }
 
+BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
+	struct bpf_perf_counter_time *, buf, u32, size)
+{
+	struct perf_event *pe;
+	u64 now;
+	int err;
+
+	if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
+		return -EINVAL;
+	err = get_map_perf_counter(map, flags, &buf->counter, &pe);
+	if (err)
+		return err;
+
+	calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
+	return 0;
+}
+
 static const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.func		= bpf_perf_event_read,
 	.gpl_only	= true,
@@ -293,6 +322,16 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static const struct bpf_func_proto bpf_perf_read_counter_time_proto = {
+	.func		= bpf_perf_read_counter_time,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg4_type	= ARG_CONST_SIZE,
+};
+
 static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd);
 
 static __always_inline u64
@@ -499,6 +538,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
 		return &bpf_perf_event_output_proto;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto;
+	case BPF_FUNC_perf_read_counter_time:
+		return &bpf_perf_read_counter_time_proto;
 	default:
 		return tracing_func_proto(func_id);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map
  2017-09-01 16:53 ` [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map Yonghong Song
@ 2017-09-01 20:29   ` Alexei Starovoitov
  2017-09-01 20:50     ` Peter Zijlstra
  2017-09-01 20:41   ` Peter Zijlstra
  1 sibling, 1 reply; 9+ messages in thread
From: Alexei Starovoitov @ 2017-09-01 20:29 UTC (permalink / raw)
  To: Yonghong Song, peterz, rostedt, daniel, netdev; +Cc: kernel-team

On 9/1/17 9:53 AM, Yonghong Song wrote:
> Hardware pmu counters are limited resources. When there are more
> pmu based perf events opened than available counters, kernel will
> multiplex these events so each event gets certain percentage
> (but not 100%) of the pmu time. In case that multiplexing happens,
> the number of samples or counter value will not reflect the
> case compared to no multiplexing. This makes comparison between
> different runs difficult.
>
> Typically, the number of samples or counter value should be
> normalized before comparing to other experiments. The typical
> normalization is done like:
>   normalized_num_samples = num_samples * time_enabled / time_running
>   normalized_counter_value = counter_value * time_enabled / time_running
> where time_enabled is the time enabled for event and time_running is
> the time running for event since last normalization.
>
> This patch adds helper bpf_perf_read_counter_time for kprobed based perf
> event array map, to read perf counter and enabled/running time.
> The enabled/running time is accumulated since the perf event open.
> To achieve scaling factor between two bpf invocations, users
> can can use cpu_id as the key (which is typical for perf array usage model)
> to remember the previous value and do the calculation inside the
> bpf program.
>
> Signed-off-by: Yonghong Song <yhs@fb•com>

...

> +BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
> +	struct bpf_perf_counter_time *, buf, u32, size)
> +{
> +	struct perf_event *pe;
> +	u64 now;
> +	int err;
> +
> +	if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
> +		return -EINVAL;
> +	err = get_map_perf_counter(map, flags, &buf->counter, &pe);
> +	if (err)
> +		return err;
> +
> +	calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
> +	return 0;
> +}

Peter,
I believe we're doing it correctly above.
It's a copy paste of the same logic as in total_time_enabled/running.
We cannot expose total_time_enabled/running to bpf, since they are
different counters. The above two are specific to bpf usage.
See commit log.

for the whole set:
Acked-by: Alexei Starovoitov <ast@kernel•org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map
  2017-09-01 20:29   ` Alexei Starovoitov
@ 2017-09-01 20:50     ` Peter Zijlstra
  2017-09-01 21:01       ` Yonghong Song
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2017-09-01 20:50 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Yonghong Song, rostedt, daniel, netdev, kernel-team

On Fri, Sep 01, 2017 at 01:29:17PM -0700, Alexei Starovoitov wrote:

> >+BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
> >+	struct bpf_perf_counter_time *, buf, u32, size)
> >+{
> >+	struct perf_event *pe;
> >+	u64 now;
> >+	int err;
> >+
> >+	if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
> >+		return -EINVAL;
> >+	err = get_map_perf_counter(map, flags, &buf->counter, &pe);
> >+	if (err)
> >+		return err;
> >+
> >+	calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
> >+	return 0;
> >+}
> 
> Peter,
> I believe we're doing it correctly above.
> It's a copy paste of the same logic as in total_time_enabled/running.
> We cannot expose total_time_enabled/running to bpf, since they are
> different counters. The above two are specific to bpf usage.
> See commit log.

No, the patch is atrocious and the usage is wrong.

Exporting a function called 'calc_timer_values' is a horrible violation
of the namespace.

And its wrong because it should be done in conjunction with
perf_event_read_local(). You cannot afterwards call this because you
don't know if the event was active when you read it and you don't have
temporal guarantees; that is, reading these timestamps long after or
before the read is wrong, and this interface allows it.

So no, sorry this is just fail.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map
  2017-09-01 20:50     ` Peter Zijlstra
@ 2017-09-01 21:01       ` Yonghong Song
  0 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-01 21:01 UTC (permalink / raw)
  To: Peter Zijlstra, Alexei Starovoitov; +Cc: rostedt, daniel, netdev, kernel-team



On 9/1/17 1:50 PM, Peter Zijlstra wrote:
> On Fri, Sep 01, 2017 at 01:29:17PM -0700, Alexei Starovoitov wrote:
> 
>>> +BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
>>> +	struct bpf_perf_counter_time *, buf, u32, size)
>>> +{
>>> +	struct perf_event *pe;
>>> +	u64 now;
>>> +	int err;
>>> +
>>> +	if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
>>> +		return -EINVAL;
>>> +	err = get_map_perf_counter(map, flags, &buf->counter, &pe);
>>> +	if (err)
>>> +		return err;
>>> +
>>> +	calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
>>> +	return 0;
>>> +}
>>
>> Peter,
>> I believe we're doing it correctly above.
>> It's a copy paste of the same logic as in total_time_enabled/running.
>> We cannot expose total_time_enabled/running to bpf, since they are
>> different counters. The above two are specific to bpf usage.
>> See commit log.
> 
> No, the patch is atrocious and the usage is wrong.
> 
> Exporting a function called 'calc_timer_values' is a horrible violation
> of the namespace.
> 
> And its wrong because it should be done in conjunction with
> perf_event_read_local(). You cannot afterwards call this because you
> don't know if the event was active when you read it and you don't have
> temporal guarantees; that is, reading these timestamps long after or
> before the read is wrong, and this interface allows it.

Thanks for explanation. Will push the read/calculate time 
enabled/running inside the perf_event_read_local then.

> 
> So no, sorry this is just fail.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map
  2017-09-01 16:53 ` [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map Yonghong Song
  2017-09-01 20:29   ` Alexei Starovoitov
@ 2017-09-01 20:41   ` Peter Zijlstra
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2017-09-01 20:41 UTC (permalink / raw)
  To: Yonghong Song; +Cc: rostedt, ast, daniel, netdev, kernel-team

On Fri, Sep 01, 2017 at 09:53:54AM -0700, Yonghong Song wrote:
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index b14095b..7fd5e94 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -901,6 +901,8 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
>  int perf_event_read_local(struct perf_event *event, u64 *value);
>  extern u64 perf_event_read_value(struct perf_event *event,
>  				 u64 *enabled, u64 *running);
> +extern void calc_timer_values(struct perf_event *event, u64 *now,
> +         u64 *enabled, u64 *running);
>  
>  

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8c01572..ef5c7fb 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -4883,7 +4883,7 @@ static int perf_event_index(struct perf_event *event)
>  	return event->pmu->event_idx(event);
>  }
>  
> -static void calc_timer_values(struct perf_event *event,
> +void calc_timer_values(struct perf_event *event,
>  				u64 *now,
>  				u64 *enabled,
>  				u64 *running)

Yeah, not going to happen...

Why not do the obvious thing and extend perf_event_read_local() to
optionally return the enabled/running times?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/4] bpf: add a test case for helper bpf_perf_read_counter_time
  2017-09-01 16:53 [PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map Yonghong Song
@ 2017-09-01 16:53 ` Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 3/4] bpf: add helper bpf_perf_prog_read_time Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 4/4] bpf: add a test case for " Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-01 16:53 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time.

Signed-off-by: Yonghong Song <yhs@fb•com>
---
 samples/bpf/tracex6_kern.c                | 26 ++++++++++++++++++++++++++
 samples/bpf/tracex6_user.c                | 13 ++++++++++++-
 tools/testing/selftests/bpf/bpf_helpers.h |  4 ++++
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/tracex6_kern.c b/samples/bpf/tracex6_kern.c
index e7d1803..46acfef 100644
--- a/samples/bpf/tracex6_kern.c
+++ b/samples/bpf/tracex6_kern.c
@@ -15,6 +15,12 @@ struct bpf_map_def SEC("maps") values = {
 	.value_size = sizeof(u64),
 	.max_entries = 64,
 };
+struct bpf_map_def SEC("maps") values2 = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(int),
+	.value_size = sizeof(struct bpf_perf_counter_time),
+	.max_entries = 64,
+};
 
 SEC("kprobe/htab_map_get_next_key")
 int bpf_prog1(struct pt_regs *ctx)
@@ -37,5 +43,25 @@ int bpf_prog1(struct pt_regs *ctx)
 	return 0;
 }
 
+SEC("kprobe/htab_map_lookup_elem")
+int bpf_prog2(struct pt_regs *ctx)
+{
+	u32 key = bpf_get_smp_processor_id();
+	struct bpf_perf_counter_time *val, buf;
+	int error;
+
+	error = bpf_perf_read_counter_time(&counters, key, &buf, sizeof(buf));
+	if (error)
+		return 0;
+
+	val = bpf_map_lookup_elem(&values2, &key);
+	if (val)
+		*val = buf;
+	else
+		bpf_map_update_elem(&values2, &key, &buf, BPF_NOEXIST);
+
+	return 0;
+}
+
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index a05a99a..2a0c5d8 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -22,6 +22,7 @@
 
 static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 {
+	struct bpf_perf_counter_time value2;
 	int pmu_fd, error = 0;
 	cpu_set_t set;
 	__u64 value;
@@ -46,8 +47,18 @@ static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 		fprintf(stderr, "Value missing for CPU %d\n", cpu);
 		error = 1;
 		goto on_exit;
+	} else {
+		fprintf(stderr, "CPU %d: %llu\n", cpu, value);
+	}
+	/* The above bpf_map_lookup_elem should trigger the second kprobe */
+	if (bpf_map_lookup_elem(map_fd[2], &cpu, &value2)) {
+		fprintf(stderr, "Value2 missing for CPU %d\n", cpu);
+		error = 1;
+		goto on_exit;
+	} else {
+		fprintf(stderr, "CPU %d: counter: %llu, enabled: %llu, running: %llu\n", cpu,
+			value2.counter, value2.time.enabled, value2.time.running);
 	}
-	fprintf(stderr, "CPU %d: %llu\n", cpu, value);
 
 on_exit:
 	assert(bpf_map_delete_elem(map_fd[0], &cpu) == 0 || error);
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb916..fe41852 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -70,6 +70,10 @@ static int (*bpf_sk_redirect_map)(void *map, int key, int flags) =
 static int (*bpf_sock_map_update)(void *map, void *key, void *value,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_sock_map_update;
+static int (*bpf_perf_read_counter_time)(void *map, unsigned long long flags,
+				       void *counter_time_buf,
+				       unsigned int buf_size) =
+	(void *) BPF_FUNC_perf_read_counter_time;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 3/4] bpf: add helper bpf_perf_prog_read_time
  2017-09-01 16:53 [PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 2/4] bpf: add a test case for helper bpf_perf_read_counter_time Yonghong Song
@ 2017-09-01 16:53 ` Yonghong Song
  2017-09-01 16:53 ` [PATCH net-next 4/4] bpf: add a test case for " Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-01 16:53 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

This patch adds helper bpf_perf_prog_read_time for perf event based bpf
programs, to read event enabled/running time.
The enabled/running time is accumulated since the perf event open.

The typical use case for perf event based bpf program is to attach itself
to a single event. In such cases, if it is desirable to get scaling factor
between two bpf invocations, users can can save the time values in a map,
and use the value from the map and the current value to calculate
the scaling factor.

Signed-off-by: Yonghong Song <yhs@fb•com>
---
 include/linux/perf_event.h |  1 +
 include/uapi/linux/bpf.h   |  8 ++++++++
 kernel/events/core.c       |  1 +
 kernel/trace/bpf_trace.c   | 24 ++++++++++++++++++++++++
 4 files changed, 34 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 7fd5e94..92955fc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -821,6 +821,7 @@ struct perf_output_handle {
 struct bpf_perf_event_data_kern {
 	struct pt_regs *regs;
 	struct perf_sample_data *data;
+	struct perf_event *event;
 };
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9c23bef..1ae55c8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -590,6 +590,13 @@ union bpf_attr {
  *     @counter_time_buf: buf to fill
  *     @buf_size: size of the counter_time_buf
  *     Return: 0 on success or negative error code
+ *
+ * int bpf_perf_prog_read_time(ctx, time_buf, buf_size)
+ *     Read perf event enabled and running time
+ *     @ctx: pointer to ctx
+ *     @time_buf: buf to fill
+ *     @buf_size: size of the time_buf
+ *     Return : 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -647,6 +654,7 @@ union bpf_attr {
 	FN(sk_redirect_map),		\
 	FN(sock_map_update),		\
 	FN(perf_read_counter_time),		\
+	FN(perf_prog_read_time),		\
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ef5c7fb..1f16f1f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8019,6 +8019,7 @@ static void bpf_overflow_handler(struct perf_event *event,
 	struct bpf_perf_event_data_kern ctx = {
 		.data = data,
 		.regs = regs,
+		.event = event,
 	};
 	int ret = 0;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b807b1a..e97620a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -608,6 +608,19 @@ BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct bpf_map *, map,
 			       flags, 0, 0);
 }
 
+BPF_CALL_3(bpf_perf_prog_read_time_tp, void *, ctx, struct bpf_perf_time *,
+	time_buf, u32, size)
+{
+	struct bpf_perf_event_data_kern *kctx = (struct bpf_perf_event_data_kern *)ctx;
+	u64 now;
+
+	if (size != sizeof(struct bpf_perf_time))
+		return -EINVAL;
+
+	calc_timer_values(kctx->event, &now, &time_buf->enabled, &time_buf->running);
+	return 0;
+}
+
 static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.func		= bpf_get_stackid_tp,
 	.gpl_only	= true,
@@ -617,6 +630,15 @@ static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
 	.arg3_type	= ARG_ANYTHING,
 };
 
+static const struct bpf_func_proto bpf_perf_prog_read_time_proto_tp = {
+         .func           = bpf_perf_prog_read_time_tp,
+         .gpl_only       = true,
+         .ret_type       = RET_INTEGER,
+         .arg1_type      = ARG_PTR_TO_CTX,
+         .arg2_type      = ARG_PTR_TO_UNINIT_MEM,
+         .arg3_type      = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 {
 	switch (func_id) {
@@ -624,6 +646,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id func_id)
 		return &bpf_perf_event_output_proto_tp;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_tp;
+	case BPF_FUNC_perf_prog_read_time:
+		return &bpf_perf_prog_read_time_proto_tp;
 	default:
 		return tracing_func_proto(func_id);
 	}
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 4/4] bpf: add a test case for helper bpf_perf_prog_read_time
  2017-09-01 16:53 [PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
                   ` (2 preceding siblings ...)
  2017-09-01 16:53 ` [PATCH net-next 3/4] bpf: add helper bpf_perf_prog_read_time Yonghong Song
@ 2017-09-01 16:53 ` Yonghong Song
  3 siblings, 0 replies; 9+ messages in thread
From: Yonghong Song @ 2017-09-01 16:53 UTC (permalink / raw)
  To: peterz, rostedt, ast, daniel, netdev; +Cc: kernel-team

The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.

Signed-off-by: Yonghong Song <yhs@fb•com>
---
 samples/bpf/trace_event_kern.c            | 5 +++++
 tools/testing/selftests/bpf/bpf_helpers.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/samples/bpf/trace_event_kern.c b/samples/bpf/trace_event_kern.c
index 41b6115..f372660 100644
--- a/samples/bpf/trace_event_kern.c
+++ b/samples/bpf/trace_event_kern.c
@@ -37,8 +37,10 @@ struct bpf_map_def SEC("maps") stackmap = {
 SEC("perf_event")
 int bpf_prog1(struct bpf_perf_event_data *ctx)
 {
+	char time_fmt[] = "Time Enabled: %lld, Time Running: %lld";
 	char fmt[] = "CPU-%d period %lld ip %llx";
 	u32 cpu = bpf_get_smp_processor_id();
+	struct bpf_perf_time time_buf;
 	struct key_t key;
 	u64 *val, one = 1;
 
@@ -54,6 +56,9 @@ int bpf_prog1(struct bpf_perf_event_data *ctx)
 		return 0;
 	}
 
+	bpf_perf_prog_read_time(ctx, (void *)&time_buf, sizeof(struct bpf_perf_time));
+	bpf_trace_printk(time_fmt, sizeof(time_fmt), time_buf.enabled, time_buf.running);
+
 	val = bpf_map_lookup_elem(&counts, &key);
 	if (val)
 		(*val)++;
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index fe41852..ddad690 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -74,6 +74,9 @@ static int (*bpf_perf_read_counter_time)(void *map, unsigned long long flags,
 				       void *counter_time_buf,
 				       unsigned int buf_size) =
 	(void *) BPF_FUNC_perf_read_counter_time;
+static int (*bpf_perf_prog_read_time)(void *ctx, void *time_buf,
+				      unsigned int size) =
+	(void *) BPF_FUNC_perf_prog_read_time;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-09-01 21:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-01 16:53 [PATCH net-next 0/4] bpf: add two helpers to read perf event enabled/running time Yonghong Song
2017-09-01 16:53 ` [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time for perf event array map Yonghong Song
2017-09-01 20:29   ` Alexei Starovoitov
2017-09-01 20:50     ` Peter Zijlstra
2017-09-01 21:01       ` Yonghong Song
2017-09-01 20:41   ` Peter Zijlstra
2017-09-01 16:53 ` [PATCH net-next 2/4] bpf: add a test case for helper bpf_perf_read_counter_time Yonghong Song
2017-09-01 16:53 ` [PATCH net-next 3/4] bpf: add helper bpf_perf_prog_read_time Yonghong Song
2017-09-01 16:53 ` [PATCH net-next 4/4] bpf: add a test case for " Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox