From: Pavel Emelyanov <xemul@parallels•com>
To: David Miller <davem@davemloft•net>,
Eric Dumazet <eric.dumazet@gmail•com>
Cc: "tj@kernel•org" <tj@kernel•org>,
"netdev@vger•kernel.org" <netdev@vger•kernel.org>
Subject: Re: [PATCH] datagram: Extend the datagram queue MSG_PEEK-ing
Date: Mon, 20 Feb 2012 16:17:11 +0400 [thread overview]
Message-ID: <4F4239C7.6040905@parallels.com> (raw)
In-Reply-To: <20120215.155226.1446930776120577759.davem@davemloft.net>
> I'm still thinking about how I feel about this.
>
> But meanwhile you can fix some thing up, for example I really
> feel that you should make sure that multiple bits aren't set
> at the same time.
>
> You either mean MSG_PEEK_MORE or MSG_PEEK_AGAIN, so setting both
> makes no sense and should be flagged as an error.
Actually Eric has pointed out the critical issue with this -- if a task
(whose socket's queue we're about to peek) is currently peek-ing this
queue himself, we will peek only the queue's tail and will potentially
break this task's peeking in case he uses the _MORE/_AGAIN macros :(
I was thinking about another option of doing the same, how about introducing
the peek offset member on sock (get/set via sockopt) which works like
* if == -1, then peek works as before
* if >= 0, then each peek/recvmsg will increase/decrease the value so that
the next peek peeks next data
It's questionable what to do if the peek_off points into the middle of a
datagram however. Here's an example of how this looks for datagram sockets
(tested on pf_unix), for stream this requires more patching.
What do you think? Does it make sense to go on with this making other
->recvmsg handlers support peeking offset?
---
diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
index 49c1704..832c270 100644
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -66,5 +66,6 @@
#define SO_RXQ_OVFL 40
#define SO_WIFI_STATUS 41
+#define SO_PEEK_OFF 42
#define SCM_WIFI_STATUS SO_WIFI_STATUS
#endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 91c1c8b..94b0372 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -357,6 +357,7 @@ struct sock {
struct page *sk_sndmsg_page;
struct sk_buff *sk_send_head;
__u32 sk_sndmsg_off;
+ __s32 sk_peek_off;
int sk_write_pending;
#ifdef CONFIG_SECURITY
void *sk_security;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 68bbf9f..78e5147 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -180,21 +180,37 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned flags,
* However, this function was correct in any case. 8)
*/
unsigned long cpu_flags;
+ long skip;
spin_lock_irqsave(&sk->sk_receive_queue.lock, cpu_flags);
- skb = skb_peek(&sk->sk_receive_queue);
- if (skb) {
+ skip = sk->sk_peek_off;
+ skb_queue_walk(&sk->sk_receive_queue, skb) {
*peeked = skb->peeked;
if (flags & MSG_PEEK) {
skb->peeked = 1;
atomic_inc(&skb->users);
- } else
+ if (skip >= skb->len) {
+ skip -= skb->len;
+ continue;
+ }
+
+ if (sk->sk_peek_off >= 0)
+ sk->sk_peek_off += (skb->len - skip);
+ } else {
__skb_unlink(skb, &sk->sk_receive_queue);
- }
- spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
- if (skb)
+ if (sk->sk_peek_off >= 0) {
+ if (sk->sk_peek_off >= skb->len)
+ sk->sk_peek_off -= skb->len;
+ else
+ sk->sk_peek_off = 0;
+ }
+ }
+
+ spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
return skb;
+ }
+ spin_unlock_irqrestore(&sk->sk_receive_queue.lock, cpu_flags);
/* User doesn't want to wait */
error = -EAGAIN;
diff --git a/net/core/sock.c b/net/core/sock.c
index 02f8dfe..5a5d581 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -792,7 +792,9 @@ set_rcvbuf:
case SO_WIFI_STATUS:
sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool);
break;
-
+ case SO_PEEK_OFF:
+ sk->sk_peek_off = val;
+ break;
default:
ret = -ENOPROTOOPT;
break;
@@ -1017,6 +1019,9 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
case SO_WIFI_STATUS:
v.val = !!sock_flag(sk, SOCK_WIFI_STATUS);
break;
+ case SO_PEEK_OFF:
+ v.val = sk->sk_peek_off;
+ break;
default:
return -ENOPROTOOPT;
@@ -2092,6 +2097,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
sk->sk_sndmsg_page = NULL;
sk->sk_sndmsg_off = 0;
+ sk->sk_peek_off = -1;
sk->sk_peer_pid = NULL;
sk->sk_peer_cred = NULL;
next prev parent reply other threads:[~2012-02-20 12:17 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 13:54 [PATCH] datagram: Extend the datagram queue MSG_PEEK-ing Pavel Emelyanov
2012-02-10 14:41 ` Eric Dumazet
2012-02-10 14:52 ` Pavel Emelyanov
2012-02-15 20:52 ` David Miller
2012-02-20 12:17 ` Pavel Emelyanov [this message]
2012-02-21 5:01 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F4239C7.6040905@parallels.com \
--to=xemul@parallels$(echo .)com \
--cc=davem@davemloft$(echo .)net \
--cc=eric.dumazet@gmail$(echo .)com \
--cc=netdev@vger$(echo .)kernel.org \
--cc=tj@kernel$(echo .)org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox